OPC Interface Failover Manual - OSIsoft



OPC DA Interface Failover Manual

for OPC Interface Version 2.3.5.0-2.3.9.0

How to Contact Us

|OSIsoft, Inc. |Worldwide Offices |

|777 Davis St., Suite 250 |OSIsoft Australia |

|San Leandro, CA 94577 USA |Perth, Australia |

| |Auckland, New Zealand |

|Telephone |OSI Software GmbH |

|(01) 510-297-5800 (main phone) |Altenstadt, Germany |

|(01) 510-357-8136 (fax) |OSI Software Asia Pte Ltd. |

|(01) 510-297-5828 (support phone) |Singapore |

| |OSIsoft Canada ULC |

|techsupport@ |Montreal, Canada  |

| |OSIsoft, Inc. Representative Office |

|Houston, TX |Shanghai, People’s Republic of China  |

|Johnson City, TN |OSIsoft Japan KK |

|Mayfield Heights, OH |Tokyo, Japan  |

|Phoenix, AZ |OSIsoft Mexico S. De R.L. De C.V. |

|Savannah, GA |Mexico City, Mexico  |

|Seattle, WA | |

|Yardley, PA | |

|Sales Outlets and Distributors |

|Brazil |South America/Caribbean |

|Middle East/North Africa |Southeast Asia |

|Republic of South Africa |South Korea |

|Russia/Central Asia |Taiwan |

| |

|WWW. |

|OSIsoft, Inc. is the owner of the following trademarks and registered trademarks: PI System, PI ProcessBook, Sequencia, |

|Sigmafine, gRecipe, sRecipe, and RLINK. All terms mentioned in this book that are known to be trademarks or service marks |

|have been appropriately capitalized. Any trademark that appears in this book that is not owned by OSIsoft, Inc. is the |

|property of its owner and use herein in no way indicates an endorsement, recommendation, or warranty of such party’s |

|products or any affiliation with such party of any kind. |

| |

|RESTRICTED RIGHTS LEGEND |

|Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the |

|Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 |

| |

|Unpublished – rights reserved under the copyright laws of the United States. |

| |

|© 2002-2008 OSIsoft, Inc PI_OPC DA Interface Failover Manul.doc |

Table of Contents

Introduction 1

Reference Manuals 1

Diagram of Hardware Connection 2

Principles of Operation 5

Server-Level Failover 6

Server-Level Failover Configurations 9

Watchdog Tags 9

Logging the Current Server 13

Logfile Messages for Server-Level Failover 14

Interface-Level Failover Using UniInt 17

Introduction 17

Failover Installation Checklist 17

Startup Command File Configuration 20

Sample Interface Startup Files 21

Command Line Parameter Considerations 21

PI ICU Configuration 22

OPC Server Failover Control Point Configuration 25

Active ID 26

Heartbeat 26

Control Point Data Flow 27

PI Failover Control Tag Configuration 28

Active ID 28

Heartbeat 29

Interface State Tag 30

Interface State Tag Configuration 31

Digital State Configuration 31

Importing Failover Digital Set to PI via PI SMT 3 32

Messages 34

Informational 34

Errors 35

Interface-Level Failover Using Microsoft Clustering 37

Choosing a Cluster Mode 38

Failover Mode 38

How It Works 39

Configuring APIOnline 40

Checklist for Cluster Configuration 42

Configuring the Interface for Cluster Failover 43

Buffering Data on Cluster Nodes 46

Group and Resource Creation Using Cluster Administrator 46

Cluster Group Configuration 46

Installation of the Resources 49

Logfile Messages for Interface-Level Failover 51

Using Combination of Server- and Interface-Level Failover 53

Revision History 55

Introduction

This is a supplemental document for configuring the OPC Interface to the PI System. It covers configuring and managing the interface for redundancy of the OPC server, the OPC interface, or both. It is intended to be used in conjunction with the OPC Interface Manual.

For server-level failover, no special hardware or software is required. Interface-level failover using Microsoft clustering requires a Microsoft Cluster. Interface-level failover using UniInt does not require any special hardware or software. It only requires a separate interface node for a backup copy of the interface.

In this manual each type of redundancy will be addressed separately and briefly looked at being used together. Note that all of the command-line parameters discussed in this document can be configured using the Interface Configuration Utility (PI ICU). The ICU simplifies configuration and maintenance, and is strongly recommended. PI ICU can only be used for interfaces which are collecting data for PI Systems, version 3.3 and up.

Reference Manuals

OSIsoft

• OPC Interface to the PI System Manual

• UniInt Interface Users Manual

Diagram of Hardware Connection

Server-level failover configuration

[pic]

Interface-level failover using UniInt configuration

[pic]

Interface-level failover using Microsoft clustering configuration

[pic]

Principles of Operation

The OPC interface is designed to provide redundancy for both the OPC server and the interface itself. For server-level failover, the interface can be configured to change to another OPC Server when the current server no longer serves data, or when an OPC item changes value or quality, or when the OPC Server changes state. This allows the data collection process to be controlled at the lowest possible level, and ensures that data collection will continue even if the connection to the PI System fails.

For interface-level failover, two copies of the interface are running at the same time with only one sending data to the PI System. There are two types of interface-level failover supported by this interface. One uses Microsoft clustering and the other uses UniInt failover mechanism.

When using Microsoft clustering, the cluster controls which copy of the interface is actually collecting data at any given time. Since the OPC Server may not be cluster-aware, there are several modes which can be configured, to ensure the least possible data loss in the event of a failover, without putting undue stress on the underlying data collection system. This type of failover is not highly recommended unless the user has other reasons for doing that.

The UniInt failover is OSIsoft’s preferred method of doing interface-level failover. It is a generic type of failover that is built into UniInt. The current UniInt failover mechanism only allows for configuring a hot failover mode and requires three control points to be created on the OPC Server. When running in hot failover mode, there are two interfaces (primary and backup) running and actively collecting data from the OPC Server, but only one of them (primary) is sending data to the PI System. The UniInt failover ensures that no data loss will occur during the transition from the primary to the backup interface, since it queues some data on both copies. This may result in some data overlap for up to 2 control point scan intervals.

The server-level failover can be combined with any type of interface-level failover to achieve redundancy at both levels of data collection, so that even the loss of both an OPC Server and one OPC Interface will not interrupt data collection. However, both types of interface-level failover can not be used at the same time.

Server-Level Failover

The basic idea behind server-level failover is that the interface should always be connected to a server that can provide data. The problem comes in how the interface knows when it should try to connect to another server. There are several ways in which an OPC Server may indicate that it is not able to serve data.

1. It does not accept connections. This is the simplest one to deal with. There is nothing to configure except the name of the alternate server.

2. It changes state when it is not active, usually to OPC_STATUS_SUSPENDED. The interface can be configured to fail over to another server when the current server leaves the RUNNING state.

3. It sends bad quality for all tags. To use an OPC item must be defined which will always have a GOOD quality except when the server is not serving data.

4. It has one or more OPC items which have a specific value when the server can serve data and another specific value when it cannot. With this version, it may be necessary to use the Transformation and Scaling ability of the interface, but as long as there is some way to translate the not-active value to a zero and the active value to >0, these OPC items can be used for Watchdog tags. It is possible to specify multiple tags as watchdogs, and specify a minimum value that defines an active server, so that the loss of some server functionality (for instance, one or two OPC Servers are not working) will not cause failover, but a falling below a specified minimum will trigger failover to another server.

5. It has one or more OPC items which have GOOD quality when the server can serve data and BAD quality when it cannot. One watchdog tag or multiple watchdog tags can be specified, in addition to specifying the maximum number of watchdog tags which can have BAD quality on the active server without triggering failover.

6. It has an OPC Item which has a specific, known value when a given server can serve data and a different known value when that server cannot serve data. In these cases, there is always one Item for each server, and two Watchdog tags are used to control which server is active. This configuration is referred to as "server-specific watchdogs", because the watchdog Item refers to a given server's current status, regardless of which server the Item value was read from.

Note: Special handling is also included for Honeywell Plantscape servers, as several customers have had difficulty in getting server-level failover to work properly with these servers. The /HWPS flag tells the interface to failover when it receives an error code of 0xE00483FD or 0xE00483FC on any tag.

The following table lists the command-line parameters used to control server-level failover. The next sections explain how to configure the interface for each of the cases above, using these parameters, and how to use the timing parameters to get the least data loss with the most reliability.

|Parameter |Description |

|/BACKUP |The name and location of the backup OPC server |

|/CS |The string tag into which should be written the name of the currently active server. |

|/FT |The number of seconds to try to connect, before switching to the backup server. |

|/NI |The number of interfaces running on this node. |

|/SW |The number of seconds to wait for RUNNING state, before switching to the backup server. |

|/WD |Watchdog tag specifications. |

|/WQ |Fail over if watchdog tag has bad quality or any error. |

|/WS |Fail over if the server leaves the RUNNING state. |

Server-Level Failover Options using ICU Control

[pic]

• Backup OPC Server Node Name -- The name or IP address of the backup OPC Server Node (/BACKUP);

• List Servers -- This button when click get a list of OPC Server Names from system found in the Backup OPC Server Node Name field. It populates the Backup OPC Server Name dropdown list.

• Backup OPC Server Name -- The registered name of the backup OPC Server on the above node (/BACKUP);

• Number of Interfaces on this Node -- The count of how many instances of the OPC interface are running on this node (/NI=#);

• Switch to Backup Delay (sec) -- The number of seconds to try to connect, before switching to the backup server (/FT=#);

• Wait for RUNNING State (sec) -- The number of seconds to wait for RUNNING status, before switching to the backup server (/SW=#);

• Current Active Server Tag -- The string tag into which should be written the name of the currently active server (/CS=tag);

• Primary Server Watchdog Tag -- Watchdog tag for the Primary Server (/WD1=tag);

• Backup Server Watchdog Tag -- Watchdog tag for the Backup Server (/WD2=tag);

• Multiple Watchdog Tag Trigger Sum -- When using multiple watchdog tags, failover will be triggered if the sum of the value of these tags drops below the value entered in this box (/WD=#);

• Number of Watchdog Tags that have Bad Quality or Any Error to trigger a Failover -- (/WQ=#) Trigger a failover if more than # number of watchdog tags have Bad Quality or Any Error. If one watchdog tag is configured set /wq=0. If more than one watchdog tag is configured, than # can be set from 0 to the number of watchdog tags configured minus 1.

• Failover if Server Leaves RUNNING State -- (/WS=1).

Server-Level Failover Configurations

These are the server-level failover options supported by the interface. This section does not deal with timing of failover at all, only with how failover is triggered. Please see the next section for timing considerations.

Inactive Server Does not Allow Connections

This is the easiest to configure, using the /BACKUP parameter to provide the name of the other OPC server. If the interface cannot connect to one server, it will try the other one. The selection of which server is active is completely managed by the servers.

/SERVER=OSI.DA.1 /BACKUP=othernode::OSI.DA.1

Inactive Server Leaves OPC_STATUS_RUNNING State

This is controlled by using the /WS parameter. Once the interface is connected to a server and collecting data, the server's state is checked every 30 seconds. With the /WS flag set, if the server leaves the RUNNING state, the interface will disconnect from the current server and try to connect to the other server.

/WS=1

Inactive Server sets Quality to BAD

Some servers only indicate that they are not the active server by setting the quality of some or all of their items to BAD. This can be used to trigger the failover of the interface to the other server, but the quality of the tag being used for a watchdog must be bad only when the interface should failover.

/WQ=# directs the interface to fail over to the other server if more than # number of watchdog tags have Bad Quality or Any Error. Note that v1.0a servers do not return error codes for individual items, so for version 1.0a servers this parameter only checks the quality of the value sent from the server.

If one watchdog tag is configured set /wq=0. If more than one watchdog tag is configured, than # can be set from 0 to the number of watchdog tags configured minus 1.

/WQ= to the number of watchdog tags minus 1.

Watchdog Tags

For server-level failover, a specific PI tag can be defined as a watchdog tag. The OPC item which this tag reads must have a specific, known value when the server is able to serve data and another specific, known value when the server is unable to serve data. It is called a watchdog tag because its value changes to announce a change in the server status.

The remaining configuration options use Watchdog tags. Watchdog tags allow the OPC servers to tell the interface which server is the currently active server. The basic idea is that if the value of the watchdog tag representing a server is greater than zero, that server is the active server. There are two different modes for using watchdog tags: isolated mode and server-specific mode. In isolated mode, each server only knows its own state. The items being used for these watchdog tags represent the current state of the server (such as backup state or active state). These items could have different values for the two servers at any given time. In server-specific mode, both servers know the state of the other server. Because of this, the items being used for the watchdog tags should match. In general, server-specific watchdog tags are a more robust failover model.

Note that watchdog tags are read in the same way as normal data tags, and the values are passed along to PI. The PI tags must be configured as integer tags, but Location2 settings can be used to read other datatypes into the integer tags. Also, the same scaling and transformation formulas are applied to the watchdog tags as for ordinary tags, so using an integer PI tag and scaling parameters the interface can recognize values of -3 and 7 as 0 and 10, respectively. Any transformation that results in an integer value of 0 for backup and >0 for active can be used in a watchdog tag.

The watchdog tags should be configured to be Advise tags, if the server can support advise tags, otherwise they should be put into a scan class with a short scan period. Whenever the values are received from the server, whether polled or advised, the values are checked to see if they match the current state of the interface. If the watchdog tags say that the interface should be connected to the other server, the interface will disconnect from the current server and attempt to connect to the other server. If it cannot connect successfully to the other server within the configured failtime given by the /FT parameter on the command-line, it will flip back over to the original server and try it again, in case that server has now become the active server.

Isolated Watchdog Tags

With isolated watchdog tags, each server only knows its own state. There are two ways to use this model. The simple version has one tag, which by itself shows whether the server is ready to serve data. Multiple tags can also be used to reflect the server's connections to its underlying data systems.

One tag

The same Item in each server reflects the state of that server. The interface will read the Item as an ordinary data value, and if the value is not greater than zero, the interface will disconnect from the server and attempt to connect to the other server. At least one of the servers should always return a 1 as the current value for this Item. The watchdog tag is identified to the interface with the /WD1 parameter. With this model, the /WD2 parameter is not used. If /WD2 is specified and not /WD1, it will be ignored by the interface.

[pic]/WD1=ServerActive

PI tag ServerActive has Instrumenttag = Watchdog1

Multiple Watchdog Tags

Multiple tags can be defined as watchdog tags, and use the sum of their values to determine whether the server is active or not. The general idea behind this model is that the server may have access to flags that show whether its collecting data from various sources, and that as long as some number of those flags show data collection, the server should continue to be used, but if enough of those flags show a connection loss, the other server should be tried to see if it has access to the underlying devices.

With this model, the watchdog tags are not specified on the command line. Instead, for each of those tags, Location3 is set to be a 3 for polled tags, or 4 for Advise tags. A minimum value is also specified that defines an active server. The interface will assume that the value of each watchdog tag is 1 at startup, and each time it gets a new value from the server, it will subtract the old value from the watchdog sum and then add the new value to it.

[pic]/WD=2

PI tag ServerActive1 has Instrumenttag = PLC1

PI tag ServerActive2 has Instrumenttag = PLC2

PI tag ServerActive3 has Instrumenttag = PLC3

As long as the sum of these three tags is at least 2, the interface will continue to collect data from the connected server. If the sum of the values goes below 2, the interface will fail over to the other server.

Using Quality

The interface can also use the quality of the watchdog tags to decide which server is active. Just as above, one or more tags can be specified as watchdog tags, and the maximum number of those tags is specified for which the interface either gets an error or a BAD quality. If more than the maximum number of tags have an error or a BAD quality, the interface will fail over to the other server. For one tag use the /WD1=tagname method and set /WQ=0:

[pic]/WD1=ServerActive /WQ=0

PI tag ServerActive has Instrumenttag = Watchdog1

To use the quality of multiple tags, set Location3 for those tags to either 3 (for polled tags) or 4 (for advise tags), and specify the maximum number of tags that can have BAD or error status on the active server.

[pic]/WQ=1

PI tag ServerActive1 has Instrumenttag = PLC1

PI tag ServerActive2 has Instrumenttag = PLC2

PI tag ServerActive3 has Instrumenttag = PLC3

Note that here the maximum error or bad quality count is being specified for an active server. In the above example, both of these servers could be active, but if server 2 loses another PLC, it will no longer be able to be the active server.

Server-specific Watchdog Tags

For the server-specific tag model, two items must be defined in both servers which will give the status of the servers. Two PI tags are defined to read those values, and the tags are defined to the interface by the /WD1 and /WD2 parameters.

[pic]/WD1=Server1Active /WD2=Server2Active

PI tag Server1Active has Instrumenttag = Watchdog1

PI tag Server2Active has Instrumenttag = Watchdog2

It is important that the two OPC Servers agree on which server is active and which is in backup mode. In active mode the OPC Server should serve data to the client, while in backup mode it should wait until the primary server fails. At any given time, only one of the watchdog tags should be zero, and it must be the same tag on both servers, unless only one server is accepting connections and serving data. If both watchdog tags are zero, neither server will be seen as active, and data collection will stop until one watchdog tag becomes nonzero. If both watchdog tags are greater than zero, the interface will remain connected to whichever server it is currently getting data from.

Multiple Interfaces

If more than one instance of the OPC interface running on the same node, PI tags will need to be created for each instance of the interface. Since each interface scans only those PI tags that belong to the interface’s unique point source, one set of watchdog tags with one pointsource does not get picked up by another instance of the interface with a different point source. In short, although there need be only 1 or 2 watchdog tag Items in each OPC server, a separate pair of PI tags (with the appropriate pointsource) needs to be configured in PI for each instance of the interface.

[pic] Interface A, with point source A:

/WD1=Server1Active /WD2=Server2Active

PI tag Server1Active has Instrumenttag = Watchdog1

PI tag Server2Active has Instrumenttag = Watchdog2

Interface B, with point Source B:

/WD1=B1Active /WD2=B2Active

PI tag BActive has Instrumenttag = Watchdog1

PI tag BActive has Instrumenttag = Watchdog2

Logging the Current Server

A PI string tag can be configured to receive the name of the server that is currently providing data, doing this will allow the tracing of which server was active when, and when the interface failed over from one server to another. The /CS parameter is used to specify the tagname of the PI string tag which has been created to hold the information. This tag should use a Manual or Lab point source, or some point source which is not used, as the interface does not treat it as a normal interface tag, but simply writes to it when the server connection changes from one server to the other. For this reason, tag edits for this tag will also not be seen by the interface until the interface is restarted (and will not matter even then, since no edits to this tag will change the interface's behavior).

Controlling Failover Timing

There are three parameters which can be used to control the timing of server-level failover. The interface should not wait too long before recognizing that the current server is no longer available, but it should not switch over to the other server to quickly only to find that it is not active, and have to switch back to the server it was on before. The default values for the Failover Time and State Wait are reasonable, and should be kept, but they can be changed if necessary to either wait longer before failing over to the other server, or give up on connecting to a server more quickly.

The Failover Time setting (/FT=#) defines the number of seconds to keep trying to reconnect to the current server, before giving up and failing over to the other server. This parameter does not affect how the interface determines that the current server is no longer responding, it only affects how long the interface will try to connect to one server before it gives up and tries the other server. The other thing which this setting affects is how often the server state is checked and when to update the clock offset. If Failover Time is set to less than 30 seconds, the interface will check the server state every Failover Time seconds.

To give the local system more time to handle requests when there are a number of interfaces running on one system, use the Number of Interfaces parameter (/NI=#). This is used as a multiplier for the Failover Time. This is most useful when the interface is running on the same system as the OPC server, and it should be set to the number of copies of the interface that are running on the system. The reasoning behind this is that most OPC servers will be slower to respond when there are multiple clients all trying to connect at the same time. For that reason, it is suggested that the RESTART_DELAY parameter be used to stagger the startup of the interfaces, if more than one copy is running on a system, but the /NI parameter will also smooth the startup and give the OPC server more time to respond.

Lastly, the State Wait parameter (/SW=#) is used to control how many seconds the interface will wait for the server to enter the RUNNING state before giving up and trying to connect to another server. Without this switch, once the interface connects, it will wait forever for the server to enter the RUNNING state. This parameter is highly variable depending on the server, with some entering the RUNNING state almost immediately and others requiring minutes to verify connections to all the remote connections before entering the RUNNING state.

Logfile Messages for Server-Level Failover

The interface will log a number of informational messages concerning the configuration and operation of server-level failover. First, at startup, when it recognizes the parameters given for failover, it will log an acknowledgement of them, so the configuration can be verified that it has been read correctly. Parameters that are not understood are ignored, as though they had not been used at all. Here are the startup informational messages for each parameter.

/CS

Current Server tag is %s

Can't find server tag, ignoring

Server tag is not String tag, ignoring

/NI

/NI, number of interfaces on the node, must be greater than 0. Argument ignored.

/WD1

Can't find Watchdog1 tag %s, ignoring

Can't get Watchdog1 tag type, ignoring

Watchdog1 tag is not Integer tag, ignoring

Watchdog1 tag will be: %s:

(note that the tagname above is delimited by colons)

/WD2

Can't find Watchdog2 tag %s, ignoring watchdogs

Can't get Watchdog2 tag type, ignoring watchdogs

Watchdog2 tag is not Integer tag, ignoring

Watchdog2 tag will be: %s:

(note that the tagname above is delimited by colons)

Further, when the interface fails over, it will log why, if the reason is something other than the current connection failing.

/WS

Server left RUNNING state, failing over

Other messages will only show if debugging is turned on, because the assumption is that the same information is available in a PI tag or from other messages in the logfile. If the interface times out on a call to the server, or gets an error on a call, that will trigger a failover. If the watchdog value changes, this will be reflected in the PI archive. But if debugging is set to 128 (or, of course, 128+something, since the debug value is additive), messages such as these will be seen, as well as many others.

Watchdog flipping over

Watchdog has error, flipping over

Watchdog has bad quality, flipping over

Interface-Level Failover Using UniInt

Introduction

UniInt provides support for a hot failover configuration. When properly configured, the interface will provide a no data loss solution for bi-directional data transfer between the PI Server and the OPC Server given a single point of failure in the system architecture. This failover solution requires that two copies of the interface be installed on different PI Interface nodes collecting data simultaneously from a single OPC Server. Each copy of the interface participating in failover has the ability to monitor and determine liveliness and failover status. Moreover, the failover operation is automatic and operates with no user interaction. To assist in administering system operations, the ability to manually trigger failover to a desired copy of the interface is also supported by this failover scheme. Implementing the UniInt failover solution requires configuration of the startup command file, OPC Server failover control points, and PI failover control tags as described below.

Each copy of the interface participating in the failover solution will queue two intervals worth of data to prevent any data loss. When a failover occurs, there may be a period of overlapping data for up to 2 intervals. The exact amount of overlap is determined by the timing and the cause of the failover and may be different every time. Using the default update interval of 1 second will result in overlapping data between 0 and 2 seconds. The no data loss claim is based on a single point of failure. If both copies of the interface have trouble collecting data for the same period of time, data will be lost during that time.

The failover scheme is described in detail in the UniInt Interface Users Manual, which is a supplement to this manual.

Failover Installation Checklist

The checklist below may be used to configure this Interface for failover. The failover configuration requires the two copies of the interface participating in failover be installed on different nodes. Users should verify non-failover interface operation as discussed in the OPC Interface manual prior to configuring the interface for failover operations. If not familiar with UniInt failover configuration, return to this section after reading the rest of the “UniInt Failover Configuration” section in detail. If a failure occurs at any step below, correct the error and start again at the beginning of the checklist. For the discussion below, the first copy of the interface configured and tested will be considered the primary interface and the second copy of the interface configured will be the backup interface.

1. Verify non-failover interface operation as described in the “Installation Checklist” in the OPC Interface manual.

2. Use the PI ICU to modify the startup command file to include the proper UniInt failover startup command-line parameters: /UFO_ID and /UFO_OtherID. See the “PI ICU Configuration” section below.

3. Create and initialize the three required failover OPC Server points for the Active ID and Heartbeat control points. These points should be tested using the PI_OPCClient tool. Make sure that the points can be written to and read from. See the “OPC Server Failover Control Point Configuration” section below.

4. Create and initialize the six required failover PI points on the PI Server for the Active ID and Heartbeat control tags. See the section “PI Failover Control Tag Configuration” below for instructions. Pay particular attention to the PointSource, Location1, and ExDesc attributes.

5. If using PI APS to synchronize the OPC Server and PI points, special attention must be paid to the failover control points and tags. Check that the failover control points and tags are not included in the PI APS synchronization scheme. Synchronizing the control points will cause the failover tags to be edited by PI APS and may result in possible interface shutdown.

6. Start the primary interface interactively without buffering.

7. Verify a successful interface start by reviewing the pipc.log file. The log file will contain messages that indicate the failover state of the interface. A successful start with only a single interface copy running will be indicated by an informational message stating “UniInt failover: Interface in the "Primary" state and actively sending data to PI. Backup interface not available.” If the interface has failed to start, an error message will appear in the log file. For details relating to informational and error messages, refer to the “Messages” section below.

8. Verify data on OPC Server using its own tools or the PI_OPCClient tool.

• The Active ID control point on the OPC Server must be set to the value of the running copy of the interface as defined by the /UFO_ID startup command-line parameter.

• The Heartbeat control point on the OPC Server must be changing values at a rate of the scan class for the corresponding failover PI point.

9. Verify data on the PI Server using available PI tools.

• The Active ID control tag on the PI Server must be set to the value of the running copy of the interface as defined by the /UFO_ID startup command-line parameter.

• The Heartbeat control tag on the PI Server must be changing values at a rate of its scan class.

10. Stop the primary interface.

11. Start the backup interface interactively without buffering. Notice that this copy will become the primary because the other copy is stopped.

12. Repeat steps 7, 8, and 9.

13. Stop the backup interface.

14. Start buffering.

15. Start the primary interface interactively.

16. Once the primary interface has successfully started and is collecting data, start the backup interface interactively.

17. Verify that both copies of the interface are running in a failover configuration.

• Review the pipc.log file for the copy of the interface that was started first. The log file will contain messages that indicate the failover state of the interface. The state of this interface must have changed as indicated with an informational message stating “UniInt failover: Interface in the “Primary" state and actively sending data to PI. Backup interface available.” If the interface has not changed to this state, browse the log file for error messages. For details relating to informational and error messages, refer to the “Messages” section below.

• Review the pipc.log file for the copy of the interface that was started last. The log file will contain messages that indicate the failover state of the interface. A successful start of the interface will be indicated by an informational message stating “UniInt failover: Interface in the “Backup” state.” If the interface has failed to start, an error message will appear in the log file. For details relating to informational and error messages, refer to the “Messages” section below.

18. Verify data on the OPC Server using its own tools or the PI_OPCClient tool.

• The Active ID control point on the OPC Server must be set to the value of the running copy of the interface that was started first as defined by the /UFO_ID startup command-line parameter.

• The Heartbeat control points for both copies of the interface on the OPC Server must be changing values at a rate of the scan class for the corresponding failover PI points.

19. Verify data on the PI Server using available PI tools.

• The Active ID control tag on the PI Server must be set to the value of the running copy of the interface that was started first as defined by the /UFO_ID startup command-line parameter.

• The Heartbeat control tags for both copies of the interface on the PI Server must be changing values at a rate of the scan class that the tags belong to.

20. Test Failover by stopping the primary interface.

21. Verify the backup interface has assumed the role of primary by searching the pipc.log file for a message indicating the backup interface has changed to the “UniInt failover: Interface in the "Primary" state and actively sending data to PI. Backup interface not available.” The backup interface is now considered primary and the previous primary interface is now backup.

22. Verify no loss of data in PI. There may be an overlap of data due to the queuing of data. However, there must be no data loss.

23. Start the backup interface. Once the primary interface detects a backup interface, the primary interface will now change state indicating “UniInt failover: Interface in the "Primary" state and actively sending data to PI. Backup interface available.” in the pipc.log file.

24. Verify the backup interface starts and assumes the role of backup. A successful start of the backup interface will be indicated by an informational message stating “UniInt failover: Interface in "Backup” state.” Since this is the initial state of the interface, the informational message will be near the beginning of the start sequence of the pipc.log file.

25. Test failover with different failure scenarios (e.g. loss of PI connection for a single interface copy). UniInt failover guarantees no data loss with a single point of failure. Verify no data loss by checking the data in PI and on the OPC Server.

26. Stop both copies of the interface, start buffering, start each interface as a service.

27. Verify data as stated above.

28. To designate a specific interface as primary. Set the Active ID point on the OPC Server of the desired primary interface as defined by the /UFO_ID startup command-line parameter.

29. Verify that command line parameters listed in the Command Line Parameter Considerations section are properly used.

Startup Command File Configuration

There are two interface startup parameters that control UniInt failover: /UFO_ID and /UFO_OtherID. UFO stands for UniInt Failover. The /UFO_ID and /UFO_OtherID parameters are required for the interface to operate in a failover configuration. All parameters specified must be configured correctly at interface startup. If they are not, the interface will not start and an error message will be printed to the interface log file. All existing UniInt startup parameters (e.g., /ps, /id, /q, /sn, etc.) will continue to function as documented and must be identical in both copies of the interface. Each of the failover startup parameters is described below.

|Parameter |Description |

|/UFO_ID=n |The required /UFO_ID startup parameter specifies the failover ID for the current copy|

|Required |of the interface. Each copy of the interface requires a failover ID specified by the |

| |/UFO_ID=n. The value, n, represents the identification number for this copy of the |

| |interface. Each copy of the interface must also know the failover ID for the |

| |redundant instance of the interface specified by the /UFO_OtherID=m. The integer |

| |number, n, used for /UFO_ID must be different than the number, m, used for |

| |/UFO_OtherID. The failover ID for both copies of the interface must be a positive |

| |integer. |

| |The failover ID is written to the Active ID point when the interface attempts to |

| |become the primary interface. The failover ID is also used to identify the Heartbeat |

| |tag for this copy of the interface. For more information on Heartbeat tag |

| |configuration, see the “Heartbeat” section below. |

|/UFO_OtherID=m |The required /UFO_OtherID startup parameter specifies the failover ID for the |

|Required |redundant copy of the interface. Each copy of the interface requires a redundant |

| |failover ID specified by the /UFO_OtherID=m. The value, m, represents the |

| |identification number for the redundant interface instance. Moreover, m must be a |

| |positive integer and must differ from the value, n, provided by the /UFO_ID=n |

| |parameter. |

| |The other failover ID is used in conjunction with the Active ID point to determine |

| |when the redundant interface is primary. The other failover ID is also used to |

| |identify the Heartbeat tag for the redundant interface copy. For more information on |

| |Heartbeat tag configuration, see the section below. |

Sample Interface Startup Files

The following is an example of the OPC interface configured for UniInt failover. In this example, the interface name is OPCInt and the interface executable is OPCInt.exe. The two interface copies are installed on different PI Interface nodes. The interface nodes are referred to as IFNode1 and IFNode2.  Any additional command-line parameters needed for the interface would be identically defined in both startup command-line files. The startup command file for the interface on IFNode1 would be defined as follows:

OPCInt.exe /PS=O /ID=1 /UFO_ID=1 /UFO_OtherID=2 ^

/host=PISrv:5450 /SERVER=OSI.DA.1 /f=00:00:01 /f=00:00:05

The startup command file for the interface on IFNode2 would be defined as follows:

OPCInt.exe /PS=O /ID=1 /UFO_ID=2 /UFO_OtherID=1 ^

/host=PISrv:5450 /SERVER=OSI.DA.1 /f=00:00:01 /f=00:00:05

[pic] CAUTION: The only differences in the startup parameters for the two interface copies are the /UFO_ID and /UFO_OtherID startup parameters. These parameters must be the reverse of one another. A configuration error in these parameters could result in no data being collected from either copy of the interface.

Command Line Parameter Considerations

When using interface-level failover based on UniInt, the overall functionality of the interface does not change. However, there are a number of command-line parameters that either cannot be used or need to be used with caution. The following describes those command-line parameters and if/how they can be used:

1. /CS=tagname – Current OPC Server name. This option is used with the server-level failover. It allows storing an active (currently providing data) OPC Server name in a string tag. When using interface-level failover based on UniInt along with server-level failover, there should be separate tags used for each copy of the interface.

2. /DB – Debugging options for the interface. These options should be used with the primary interface. Using them for the backup interface may not give the expected result. For example, /DB=8 should create three files: opcscan.log, opcrefresh.log, and opcresponse.log. If this option is used with the backup interface, these files will not be created.

3. /DF=tagname – Debug flag tag. Debugging options can be changed while the interface is running for only the primary interface. The backup interface can take advantage of this option only when it becomes primary.

4. /DLL=filespec – Using post-processing DLLs. Currently, post-processing DLLs do not support UniInt failover. If the interface is configured to use one of those DLLs, it should not be set up for UniInt based failover.

5. /DT=tagname – Debug tag. The primary and backup interfaces can use the same or different debug tags.

6. /NT=Y – Writing I/O Timeout. I/O Timeout will NOT be written if the interface-level failover based on UniInt is used. UniInt will suppress all I/O Timeout messages.

7. /OPCSTOPSTAT – Stop status of the OPC Interface. This option does not work if the interface is configured for UniInt based failover.

8. /QT=tagname – This option allows defining a PI tag which will receive the count of how many items the interface has queued up to go to the PI System. For each copy of the interface there should be a separate PI tag defined.

9. /ST=tagname – Status tag. This allows assigning a PI tag that will get the status of the OPC Server whenever it changes. For each copy of the interface there should be a separate tag assigned.

Note: SCAN OFF point attribute does not work as normal if the interface is configured for interface-level failover based on UniInt. Having a backup interface will prevent it from writing SCAN OFF message to appropriate tags.

PI ICU Configuration

The use of the PI ICU is the recommended and safest method for configuring the Interface for UniInt failover. With the exception of the notes described in this section, the Interface shall be configured with the PI ICU as described in the “Configuring the Interface with the PI ICU” section of this manual.

Note: With the exception of the /UFO_ID and /UFO_OtherID startup command-line parameters, the UniInt failover scheme requires that both copies of the interface have identical startup command files. This requirement causes the PI ICU to produce a message when creating the second copy of the interface stating that the “PS/ID combo already in use by the interface” as shown in Figure 1 below. Ignore this message and click the Add button. This message is not present if using PI ICU 1.4.1.0 (PR1) or higher.

[pic]

Figure 1: PI ICU (version 1.4.0.3 or earlier) configuration screen displaying a message that the “PS/ID combo already in use by the interface.” The user must ignore the yellow boxes, which indicate errors, and click the Add button to configure the interface for failover.

There are two interface startup parameters that control UniInt failover: /UFO_ID and /UFO_OtherID. The UFO stands for UniInt Failover. The /UFO_ID and /UFO_OtherID parameters are required for the interface to operate in a failover configuration. Each of these parameters is described in detail in Startup Command File Configuration section above. These parameters must be entered into the Additional Parameters text field located under the opcint tab in the PI ICU utility, see figure 2 below.. See figure 3 below if using the PI ICU version 1.4.1.0 (PR1) which has support for UniInt Failover.

PI ICU 1.4.0.3 or lower

[pic]

Figure 2: PI ICU configuration screen showing the UniInt failover startup parameters entered in the Additional Parameters text field. This copy of the interface defines /UFO_ID=2 and /UFO_OtherID=1. The other failover interface copy must define /UFO_ID=1 and /UFO_OtherID=2 in its Additional Parameters field.

PI ICU 1.4.1.0 (PR1) or higher

[pic]

Figure 3 PI ICU 1.4.1.0 (PR1) configuration screen showing the UniInt failover startup parameters. This copy of the interface defines /UFO_ID=2 and /UFO_OtherID=1. The other failover interface copy must define /UFO_ID=1 and /UFO_OtherID=2 in the UniInt Failover section of the ICU.

OPC Server Failover Control Point Configuration

In order to synchronize the two copies of the interface, there must be three interface Control Points residing on the OPC Server. There must be one Active ID control point and two Heartbeat control points, one for each copy of the interface. Each of these control points must be initialized to a valid value that when read by the interface would not produce an error that would write a system digital state value to PI.

This interface supports the use a PI Auto Point Synchronization (APS) connector. If using APS to synchronize the OPC Server and PI points, special attention should be paid to the failover control points and tags. Check that the failover control points and tags are not included in the APS synchronization scheme. Synchronizing the control points will cause the failover tags to be edited by APS and may result in possible interface shutdown.

Note: OPC Server control points that cannot be initialized may produce a bad result when read by UniInt failover and cause the interface to fail. If the points on the OPC Server cannot be initialized and return a bad result to the interface, bypass failover operations by removing the failover startup command-line parameters and run the interface in a non-failover configuration. Force an output value from PI to each of the failover control points; Active ID and Heartbeat points. To output a value for the interface specific Heartbeat point, each interface participating in failover will need to be run separately. Once the values on the OPC Server are valid, insert the proper failover startup command-line parameters and restart the interface.

Active ID

The Active ID point is used to identify which copy of the interface will act as the primary interface sending data to PI. The UniInt failover scheme will determine which copy of the interface will act as the primary copy and which will act as the backup copy of the interface. The primary copy of the interface will set the Active ID control point on the OPC Server to the value of its ID as defined by the /UFO_ID=n startup command-line for the primary interface copy. The status of an interface as primary or backup can be changed by simply changing the value of the Active ID control point on the OPC Server to the ID of the desired primary copy of the interface.

During a normal interface shutdown sequence, the interface will write a value of zero to the Active ID control point if the interface is in a primary role as indicated by the Active ID control point. Setting the Active ID control point to zero allows the backup copy of the interface to quickly transition to the primary role.

Heartbeat

The two Heartbeat control points are used to monitor the liveliness of the failover configuration. Each copy of the interface is assigned one Heartbeat control point on the OPC Server to write (output) values. Each copy of the interface also reads (input) the value of the Heartbeat control point of the other interface in the failover configuration. Simply put, the concept of operation for the Heartbeat control point is for each copy of the interface to output a Heartbeat value to its Heartbeat control point and read the Heartbeat value of the other copy of the interface.

During a normal interface shutdown sequence, the interface will write a value of zero to its Heartbeat control point. Setting the Heartbeat control point to zero allows the backup copy of the interface to quickly transition to the primary role.

Note: The Active ID and Heartbeat control points can be created using OPC Server specific configuration tools. For more information on how to do this, consult your OPC Server manual or contact your server vendor.

Control Point Data Flow

The figure below shows the data flow to and from the Heartbeat and Active ID control points within a generic OPC Server. The control points can be located either on the OPC Server or the underlying data source (e.g. DCS, PLC or other).

[pic]

PI Failover Control Tag Configuration

[pic] CAUTION: Users must not delete the failover control tags once the interface has started. Deleting any of the failover control tags after interface startup will cause the interfaces in the failover scheme to shutdown and log an error message to the pipc.log file.

For details on proper configuration of failover control tags, refer to the sections below. It is highly recommended that the PI System Administrator be consulted before any changes to failover tags are made.

Synchronization of the two failover interface copies requires the configuration of six PI tags that are used to send and received data for each of the OPC Server failover control points. All six PI tags must be configured correctly at interface startup or the interface will not start and an error message will be logged to the interface log file. The six PI tags are used exclusively for configuring the interface control points. Values written to an OPC Server failover control point are also written to the corresponding PI tag as a historical record.

The only PI tag attribute used specifically for OPC Server failover control point configuration is the ExDesc attribute. All other PI tag attributes are configured according to the interface documentation. For example, the PointSource attribute must match the /ps interface startup parameter or the interface will not load the PI tag.

This interface supports the use a PI Auto Point Synchronization (APS) connector. If using APS to synchronize the OPC Server and PI points, special attention should be paid to the failover control points and tags. Check that the failover control points and tags are not included in the APS synchronization scheme. Synchronizing the control points will cause the failover tags to be edited by APS and may result in possible interface shutdown.

The interface installation kit includes the sample file, UniInt_Failover_Sample_PI_Tags.xls that can be used with the Tag Configurator add-in for Excel to create UniInt failover control tags. Simply modify the point attributes as described in the sections below and use the Configurator to create the tags on the PI server.

Note: The PointSource and Location1 attributes must be identical for all the failover control tags in the failover scheme and must match the PointSource and Location1 attributes for PI tags loaded by the interface. Failure to comply with this rule will result in the interface failing to start.

Active ID

The Active ID tag is used to identify which copy of the interface will act as the primary interface sending data to PI. For a redundant interface installation, one interface Active ID input tag and one Active ID output tag must be configured. The Active ID input tag must be configured to read from the Active ID control point on the OPC Server. Whereas the Active ID output tag must be configured to write to the Active ID control point on the OPC Server. The Active ID tags must be successfully loaded or the interface will log a message to the interface log and fail to start.

To configure the interface Active ID tag, the string [UFO_ActiveID] must be found in the ExDesc attribute of the PI tag. The UFO_ActiveID keyword is not case sensitive. The square brackets must be included. The Interface Active ID Tag should be configured as an integer tag.

During a normal interface shutdown sequence, the interface will write a value of zero to its Active ID control point and PI tag. Setting the Active ID control point to zero allows the backup copy of the interface to quickly transition to the primary role.

Active ID Tag Configuration

|Attributes |ActiveID IN |AcitveID OUT |

|Tag |_Active_IN |_Active_OUT |

|ExDesc |[UFO_ActiveID] |[UFO_ActiveID] |

|Location1 |Match # in /id=# |Match # in /id=# |

|Location3 |1 |2 |

|PointSource |Match x in /ps=x |Match x in /ps=x |

|PointType |Int32 |Int32 |

|InstrumentTag | | |

|Shutdown |0 |0 |

|Excmax |0 |0 |

|Excmin |0 |0 |

|Excdev |0 |0 |

|Excdevpercent |0 |0 |

|Compressing |0 |0 |

Heartbeat

The Heartbeat control points are used to monitor the liveliness of the failover configuration. For interface failover to operate properly, each copy of the interface must have an input Heartbeat PI tag, and an output Heartbeat PI tag. Therefore, a total of four Heartbeat tags are required.

The input and output tag for each interface copy must have the string [UFO_Heartbeat:n] in the ExDesc attribute of the PI tag. The value of n must match the failover ID for the interface as defined by the /UFO_ID or /UFO_OtherID startup parameter (see example below). The UFO_HeartBeat keyword is not case sensitive. The square brackets must be included. All four of the Heartbeat tags must be successfully loaded or the interface will log a message to the pipc.log and fail to start.

For example: An interface copy participating in failover has /UFO_ID=5 and /UFO_OtherID=6 on the startup command line indicating that its interface ID is 5 and the other copy of the interface has an ID of 6. The ExDesc attribute for the input and output Heartbeat tags for the interface with an ID of 5 must have [UFO_Heartbeat:5] defined. Likewise, the ExDesc attribute for the input and output Heartbeat tags for the interface with an ID of 6 must have [UFO_Heartbeat:6] defined.

During a normal interface shutdown sequence, the interface will write a value of zero to its Heartbeat control point. Setting the Heartbeat control point to zero allows the backup copy of the interface to quickly transition to the primary role.

Heartbeat Tag Configuration

|Attribute |Heartbeat 1 IN |Heartbeat 1 OUT |Heartbeat 2 IN |Heartbeat 2 OUT |

|Tag |_IN |_OUT |_IN |_OUT |

|ExDesc |[UFO_Heartbeat:#] |[UFO_Heartbeat:#] |[UFO_Heartbeat:#] |[UFO_Heartbeat:#] |

| |Match # in /UFO_ID=# |Match # in /UFO_ID=# |Match # in /UFO_OtherID=# |Match # in /UFO_OtherID=# |

|Location1 |Match # in /id=# |Match # in /id=# |Match # in /id=# |Match # in /id=# |

|Location3 |1 |2 |1 |2 |

|Point Source |Match x in /ps=x |Match x in /ps=x |Match x in /ps=x |Match x in /ps=x |

|Point Type |int32 |int32 |int32 |int32 |

|InstrumentTag | | | | |

|Shutdown |0 |0 |0 |0 |

|Excmax |0 |0 |0 |0 |

|Excmin |0 |0 |0 |0 |

|Excdev |0 |0 |0 |0 |

|Excdevpercent |0 |0 |0 |0 |

|Compressing |0 |0 |0 |0 |

Interface State Tag

UniInt failover provides the ability to monitor the operational state of the interface using a PI tag. Each copy of the interface participating in failover can have an interface state tag defined to monitor the individual interface. To configure the interface readiness tag, the string [UFO_State:n] must be found in the ExDesc attribute of the PI tag. The value of n must match the failover ID for the interface as defined by the /UFO_ID startup parameter. The UFO_State keyword is not case sensitive. The square brackets must be included. The Interface state should be configured as a digital tag.

Note: UniInt limits the number of interface state tags to one per interface copy. If more than one tag is created for a particular copy of the interface, only the last tag sent to the interface during the startup process will be configured to monitor the interface state. All other interface state tags for this copy of the interface will be ignored and will not receive data.

Interface State Tag Configuration

|Point Attribute |Primary |Backup |

|Tag | | |

|DigitalSet |UFO_State |UFO_State |

|ExDesc |[UFO_State:#] |[UFO_State:#] |

| | | |

| |(Match /UFO_ID=# on primary node) |(Match /UFO_ID=# on backup node) |

|Location1 |Match # in /id=# |Same as for Primary node |

|PointSource |Match x in /ps=x |Same as for Primary node |

|PointType |digital |digital |

|Shutdown |0 |0 |

|Step |1 |1 |

Digital State Configuration

OSIsoft recommends configuring digital state set when using interface state tags to monitor the operational state of the failover configuration. UniInt is capable of providing six different states (values) that indicate the operational condition of interfaces participating in failover.

|State Number |State Name |Description |

|0 |Off |The interface is not started. |

|1 |Backup_No_DataSource |The interface is connected to the PI Server, but not to the |

| | |OPC Server. No data is being collected by the interface. |

|2 |Backup_No_PI |The interface is connected to the OPC Server, but not to the |

| | |PI Server. The interface is actively collecting and queuing |

| | |data. If the primary interface fails, this copy of the |

| | |interface will continue to collect data and if a connection to|

| | |PI becomes available, the queued data will be sent to the PI |

| | |Server. |

| | |The primary copy of the interface has the ability to monitor |

| | |the backup interface and is able to set the state of the |

| | |backup interface on the PI Server accordingly. |

|3 |Backup |The interface is connected to PI and the OPC Server. Data is |

| | |being collected and queued by the interface. If the primary |

| | |interface fails, this copy of the interface will transition to|

| | |primary and send its queued data to PI and continue in the |

| | |primary role. |

|4 |Transition |The interface is changing roles from Backup to Primary. The |

| | |interface remains in this state for two update intervals. |

|5 |Primary |The interface is connected to both the PI Server and the OPC |

| | |Server. Data is actively being sent to the PI Server. |

Importing Failover Digital Set to PI via PI SMT 3

The interface installation kit includes the digital set file, UniInt_Failover_DigitalSet_UFO_State.csv, that can be imported using the PI System Management Tools (SMT) (version 3.0.0.7 or above) application. The procedure below outlines the steps necessary to create a digital set on a PI Sever using the “Import from File” function found in the SMT application. The procedure assumes the user has a basic understanding of the SMT application.

1. Open the SMT application.

2. Select the appropriate PI Server from the PI Servers window. If the desired server is not listed, add it using the PI Connection Manager. A view of the SMT application is shown in Figure 4 below.

3. From the System Management Plug-Ins window, select Points then Digital States. A list of available digital state sets will be displayed in the main window for the selected PI Server. Refer to Figure 4 below.

4. In the main window, right click on the desired server and select the “Import from File” option. Refer to Figure 4 below.

[pic]

Figure 4: PI SMT application configured to import a digital state set file. The PI Servers window shows the “localhost” PI Server selected along with the System Management Plug-Ins window showing the Digital States Plug-In as being selected. The digital state set file can now be imported by selecting the Import from File option for the localhost.

5. Navigate to and select the UniInt_Failover_DigitalSet_UFO_State.csv file for import using the Browse icon on the display. Select the desired Overwrite Options. Click on the OK button. Refer to Figure 5 below.

[pic]

Figure 5: PI SMT application Import Digital Set(s) window. This view shows the UniInt_Failover_DigitalSet_UFO_State.csv file as being selected for import. Select the desired Overwrite Options by choosing the appropriate radio button.

6. Navigate to and select the UniInt_Failover_DigitalSet_UFO_State.csv file for import using the Browse icon on the display. Select the desired Overwrite Options. Click on the OK button. Refer to Figure 5 above.

7. The UFO_State digital set is created as shown in Figure 6 below.

[pic]

Figure 6: The PI SMT application showing the UFO_State digital set created on the “localhost” PI Server.

Messages

The following are examples of typical error and informational messages that can be found in the pipc.log file.

Informational

16-May-06 10:38:00

OPCpi> 1> UniInt failover: Interface in the "Backup" state.

Meaning: Upon system startup, the initial transition is made to this state. While in this state the interface monitors the status of the other interface participating in failover. Data received from the OPC Server is queued and not sent to the PI Server while in this state. The amount of data queued while in this state is determined by the failover update interval. In any case, there will be typically no more than two update intervals of data in the queue at any given time. Some transition chains may cause the queue to hold up to five failover update intervals worth of data

16-May-06 10:38:05

OPCpi> 1> UniInt failover: Interface in the “Primary” state and actively

sending data to PI. Backup interface not available.

Meaning: While in this state, the interface is in its primary role and sends data to the PI Server as it is received. This message also states that there is not a backup interface participating in failover.

16-May-06 16:37:21

OPCpi> 1> UniInt failover: Interface in the “Primary” state and actively

sending data to PI. Backup interface available.

Meaning: While in this state, the interface sends data to the PI Server as it is received

Errors

16-May-06 17:29:06

OPCpi> 1> Loading Failover Synchronization tag failed

Error Number = 0: Description = [FailOver] or [HeartBeat:n] was found in the exdesc for Tag Active_IN

but the tag was not loaded by the interface.

Failover will not be initialized unless another Active ID tag is

successfully loaded by the interface.

Cause: The Active ID or Heartbeat tag is not configured properly.

Resolution: Check validity of point attributes. For example, make sure Location1 attribute is valid for the interface. All failover tags must have the same PointSource and Location1 attributes. Modify point attributes as necessary and restart the interface.

16-May-06 17:29:06

OPCpi> 1> One of the required Failover Synchronization points was not loaded.

Error = 0: The Active ID synchronization point was not loaded.

The input PI tag was not loaded

Cause: The Active ID tag is not configured properly.

Resolution: Check validity of point attributes. For example, make sure Location1 attribute is valid for the interface. All failover tags must have the same PointSource and Location1 attributes. Modify point attributes as necessary and restart the interface.

16-May-06 17:38:06

OPCpi> 1> One of the required Failover Synchronization points was not loaded.

Error = 0: The Heartbeat point for this copy of the interface was not loaded.

The input PI tag was not loaded

Cause: The Heartbeat tag is not configured properly.

Resolution: Check validity of point attributes. For example, make sure Location1 attribute is valid for the interface. All failover tags must have the same PointSource and Location1 attributes. Modify point attributes as necessary and restart the interface.

17-May-06 09:05:39

OPCpi> 1> Error reading Active ID point from Data source

Active_IN (Point 29600) status = -255

Cause: The Active ID point value on the OPC Server produced an error when read by the interface. The value read from the OPC Server must be valid. Upon receiving this error, the interface will enter the “Backup in Error state.”

Resolution: Check validity of the value of the Active ID point on the OPC Server.

17-May-06 09:06:03

OPCpi> 1> Error reading the value for the other copy's Heartbeat point from Data source

HB2_IN (Point 29604) status = -255

Cause: The Heartbeat point value on the OPC Server produced an error when read by the interface. The value read from the OPC Server must be valid. Upon receiving this error, the interface will enter the “Backup in Error state.”

Resolution: Check validity of the value of the Heartbeat point on the OPC Server.

17-May-06 09:06:03

OPCpi> 1> UniInt failover: Interface in an "Error" state. Could not read failover control points."

Cause: The failover control points on the OPC Server are returning a value to the interface that is in error. This error can be caused by creating a non-initialized control point on the OPC Server.”

Resolution: Check validity of the value of the control points on the OPC Server.

17-May-06 09:06:03

OPCpi> 1> The Uniint FailOver ID (/UFO_ID) must be a positive integer

Cause: The UFO_ID parameter has not been assigned a positive integer value.

Resolution: Change and verify the parameter to a positive integer and restart the interface.

17-May-06 09:06:03

OPCpi> 1> The Failover ID parameter (/UFO_ID) was found but the ID for

the redundant copy was not found

Cause: The UFO_OtherID parameter is not defined or has not been assigned a positive integer value.

Resolution: Change and verify the UFO_OtherID parameter to a positive integer and restart the interface.

Interface-Level Failover Using Microsoft Clustering

Microsoft Clustering (MSCS) is required for non UniInt interface-level failover. This involves two copies of the interface running on cluster nodes. These are referred as primary and backup interfaces. The primary interface is the one that actively gets data from the OPC Server and sends data to the PI System. The backup interface can be controlled as to whether it connects to the OPC server, creates groups active or inactive, and adds tags to the groups. In any case, only the active interface signs up for exceptions with the OPC server or requests data, and only the active interface sends data to PI. These features avoid over-burdening the OPC server or the backend data source with duplicate requests for data by the redundant pair of interfaces.

In MSCS, failover is managed by creating resource groups which contain cluster resources. At any given point only one of the 2 cluster nodes has possession of a resource group. By “possession of a resource group” it is meant the resources belonging to that group are running on the node. The primary cluster group contains the resource for the physical devices that are required by the cluster. Other groups can be created and populated with other resources. OSI provides a cluster resource of type Generic Service called apionline, which may be used as the cluster resource which the interface will watch.

At startup each interface will check whether the designated resource is running on its node, and if it is, that interface is the active interface. The overriding rule is that whichever node currently owns the resource that is the node that has the active interface.

The following table lists the parameters that are concerned with MSCS interface-level failover.

The five parameters that are used for clustered failover are listed below. See the following sections for further information on them.

|/PR=# |Specifies that this is the primary (/PR=1) or backup (/PR=2) node. |

|/RN=# |Specifies the number used for the matching apionline and resource. |

|/CM=# |Specifies the Cluster Mode, whether the interface has a bias toward running on the |

| |primary node (/CM=0) or no bias (/CM=1). |

|/CN=tagname |Specifies a PI string tag which receives the name of the node which is currently active. |

| |Optional. |

|/FM=# |Failover Mode. Selects behavior for the backup interface. Chilly (/FM=1), Cool (/FM=2) |

| |or Warm (/FM=3) |

[pic]

Choosing a Cluster Mode

In Cluster Mode 0 (/CM=0), one interface is designated the Primary interface, and this interface will always be the active interface if at all possible. That means that if the other interface is currently the active interface and the interface on the primary node is started, the primary interface will move the resource onto its node so that it will be the active interface. It is not possible to have the interface on the primary node up and running and connected to an OPC server, and still have the interface on the backup node be the active interface. Cluster Mode 0 has a bias toward keeping the primary node the active node.

With Cluster Mode 1 (/CM=1), the interface does not attempt to control which node is the active node, and whichever node is currently the active node will remain the active node until there is a problem that causes a failover or until human intervention occurs, to either move the resource or shut down the active interface. Cluster Mode 1 has no bias at all.

Failover Mode

The Failover Mode determines what the backup interface does with respect to connecting to an OPC server, creating groups, and adding tags. The more of this that is done already, the faster failover is, but there may be a cost in loading up the OPC server or the underlying data system. So choose a Failover Mode based on how long it takes to start getting data after a failover, and whether one or more Failover Modes puts an unacceptable load on the OPC server or data system. Since this will vary widely from server to server and site to site, the best choice may have to be determined by trial and error, or by checking the OPC server documentation or asking the OPC server vendor.

/FM=1

Chilly failover, connect to the server but do not create groups. This is the slowest failover mode, and is not recommended for use with servers where adding tags to groups takes a long time. However, it is very unlikely for this mode to put any measurable load on the server or the underlying data system.

/FM=2

Cool failover, connect to the server, create groups inactive, and add tags. This will work well for servers which do not use any resources for inactive groups. Since this is outside of the scope of the OPC standard, verify that the server is not loaded down by having inactive groups. Some servers will use resources to keep track of items in inactive groups, due to the requirements of their particular data system. This mode may involve some small delay at failover, as the server is not required to keep the current value for the items, and at failover the server may have to acquire the current value for all of the items at the same time.

/FM=3

Warm failover, connect to the server, create groups active, and add tags, but do not advise the groups. This is the same as the old default behavior. Some servers use minimal resources for handling this mode, particularly servers which have the data already, but other servers which have to acquire the data specifically to comply with the OPC standard may be overloaded by this setting. This is the fastest failover mode, since at failover the interface simply requests that the server start sending data and the server will already have the current values available.

How It Works

Apionline

Apionline is a very simple program that only exists to watch for the interface. The name of the interface service that a given apionline is to watch is specified, and the command line of that interface tells it what copy of apionline to look for.

If the interface sees that copy of apionline running, that means that this is the active node. The cluster manager will start apionline on the node that currently owns the resource group. If apionline does not see the designated copy of the interface running, apionline will shut itself down. That causes the cluster manager to start up apionline on the other node of the cluster, which will tell the interface on that other node that it is now the active interface. Apionline and the interface act together; apionline cannot run if the interface is not running, and the interface will only collect data if apionline is running. MSCS is responsible for starting and stopping apionline, either automatically or in response to a human making changes with Cluster Administrator, but the interface itself will shut down apionline if it cannot collect data or if Cluster Mode 0 is being used and the backup node is currently the active node. Both the primary and the backup interface, running in either mode, will shut down their local copy of apionline if the interface is unable to collect data, but only the primary interface, running in Cluster Mode 0, will shut down the copy of apionline running on the backup node, in order to take over as the active interface.

Note the same name must be used for the interface on both nodes, since apionline will be using the same parameters whichever node it runs on, and one of those parameters is the name of the service to watch. So if the interface is installed to a directory called "OPC1" on the first node, it must be installed to the directory "OPC1" on the second node. The part that matters is the name of the actual directory that the interface is installed into, the path leading up to the directory can differ, one copy of the interface could use

c:\Program Files\PIPC\Interfaces\OPC1

on one node, and on the other could use

d:\OSI\PIPC\Interfaces\OPC1

and the failover configuration will still work. OSIsoft recommends the same naming and placement scheme be used across all the systems, just for clarity.

Configuring APIOnline

With the interface installation two files will be found in the interface directory, apionline.bat and apionline.exe. If the default directory was used for the installation, the interface directory is OPCInt, and this is also the name of the interface executable, and the name that was used to install the interface as a service. In this case, no files need to be edited.

Looking at the list of services (Control Panel /Services); there will be one called “OPC for PI”. The parameter in apionline.bat must match the name of the OPC interface service as seen in the Services applet.

Next, the apionline service must be installed with the command

apionline –install –depend tcpip

Multiple Interfaces

To use multiple instances of the interface, copies of apionline.exe and apionline.bat should be made for each interface instance, with a unique integer appended to the name (apionoline1, apionline2, etc.); apionline by itself without the number suffix is acceptable too. Any non-negative integer can be used, but it should match the number specified to the interface with the /RN=# parameter. Creating multiple copies of the interface executable is suggested and appending the same number to them, resulting in the following

apionline1.exe, apionline2.exe, apionline3.exe

apionline1.bat, apionline2.bat, apionline3.bat

and

opcint1.exe, opcint2.exe, opcint3.exe

This makes life much simpler when tracking problems or load balancing. Also, it helps if separate ID number (/ID=#) are used for each copy of the interface, as that shows up in the pipc.log file and makes reading the logfile much easier.

The apionline.bat file will need to be edited to set the correct names for apionline# and opcint#. For example, with the above 3 instances of the interface, the following would be in apionline1.bat:

apionline1.exe /proc=opcint1

and in opcint1.bat the other parameters would be,

/RN=1 /ID=1

Likewise, apionline2.bat would have

apionline2.exe /proc=opcint2

and in opcint2.bat the other parameters would be,

/RN=2 /ID=2

And apionline3.bat would have

apionline3.exe /proc=opcint3

and in opcint3.bat the other parameters would be,

/RN=3 /ID=3

Note that the ID has to match Location1 of all tags for the interface, and having the ID match the resource number is a suggestion, not a requirement. The interface will work just fine if the same ID is used for all three copies of the interface, but reading the pipc.log file will be considerably harder.

Once the files have been made, these resources should be installed as services on the Interface Node as follows

apionline# -install –depend tcpip

so for the example here all three copies of apionline would be installed as

apionline1 –install –depend tcpip

apionline2 –install –depend tcpip

apionline3 –install –depend tcpip

Finally, running multiple instances of the interface on each node in the cluster requires creating a uniquely named resource for each pair of redundant interfaces. Each resource should be created in its own uniquely named group. Since all instances of interfaces will have a slightly different timing sequence in connecting to the server, they cannot be properly supported for failover if all of them share the same resource. MSCS moves the resources by the group: resources can be configured to not affect the group failover, but once failover takes place, they move together as a group. Having separate resources and groups also allows for specific arrangements for load balancing, for instance having one active interface on each node, so both nodes are lightly loaded as long as both nodes are fully functional, but having the ability for one node to take the full workload if the other node fails.

So, for the above case where there are three copies of the interface running on each node, three resource groups would be created, and each group would have one copy of apionline as its resource. Follow the directions in Group and Resource Creation three times, once for each copy of the interface, to set up the three groups and resources.

OPCStopStat and Failover

No OPCStopStat is written when an interface fails over or fails back. To track which interface is active, use the /CN parameter to have the interface write the Current Node to a PI tag, whenever it changes.

At interface shutdown, if the interface is the active interface, the digital state specified for OPCStopStat will be written to all tags. The writing of OPCStopStat at interface shutdown can be avoided if Cluster Mode 1is being used, by moving the resource to the other node before shutting down the interface.

Checklist for Cluster Configuration

A lot of potential frustration can be eliminated if things are verified step by step. Here is a simple configuration procedure that will help identify any problems quickly. This is written for just one copy of the interface on each node. If multiple copies are being configured, the first 5 steps only need to be done for the first copy of the interface that is tested. When it says “matching” below, it means when working with opcint3.exe, look for apionline3.exe, and the apionline3 service and resource.

1. Configure the interface on each node with a dummy pointsource, one which is not currently used by any tags, or with a pointsource and ID number that do not match the pointsource and Location1 pair of any tags. The idea is to bring up both interfaces with no tags at all. Give them the correct Server and Host, but do not configure any failover-related parameters.

2. Start both interfaces, check the pipc.log to verify that both of them come up completely and sit there with no tags. If there are errors at this point, they are probably permission or DCOM problems, but any errors reported in pipc.log must be corrected before continuing with the next step.

3. Using Cluster Administrator, bring the matching cluster resource online by selecting the matching cluster group, then right-clicking on the resource and selecting Bring Online. Use the Task Manager to see that the matching apionline process is running on the node that Cluster Manager says owns the resource. That node will be called the OriginalOwner.

4. Still using Cluster Administrator, fail over the resource by selecting Initiate Failure in the right-click menu of the resource. The resource state should go to Failed and then Online Pending and then Online, with the other node now the owner. Depending on the system, the intermediate states may not be seen, but the resource should definitely end up Online with the other node as the owner. If not, there is a configuration problem and it must be corrected before continuing the test.

5. Use the Task manager to verify that the matching apionline on the OriginalOwner node is no longer running and that the matching apionline service is now running on the other node (OriginalBackup node). If all this is good so far, move the resource to whichever node will be the primary node.

6. Now use Cluster Manager to take the resource Offline, then shut down both copies of the interface. Configure them for production (do not forget to reset the pointsource and /ID to the correct values). At this point, Cluster Manager should show the resource offline, but owned by the primary node. If not, move the resource group to the primary node, while leaving the resource itself offline. This can be done by right-clicking on the group and selecting Move Group.

7. Bring up the interface on the node that does not currently own the group. The following message should be found in the pipc.log

Cluster resource not online, state 4, waiting

8. Bring the resource online. The resource should failover to the node where the interface is running. Once apionline is running on the same node as the interface, the following message should be found in the pipc.log

Cluster Resource apionline1 on this node

or possibly

Resource now running on this node

9. Now bring up the other interface. If Cluster Mode 0 is being used, the resource will now failover to the primary node. One of the two messages listed in the last step should appear in the pipc.log on the primary node.

The interfaces should not be configured correctly. As a further test try failing over the resource a time or two, and shutting down one interface at a time, just to make sure that the interfaces do what is expected.

Configuring the Interface for Cluster Failover

The five parameters that are used for clustered failover are:

/PR=# Specifies that this is the primary (/PR=1) or backup (/PR=2) node.

/RN=# Specifies the number used for the matching apionline and resource. If a resource number is not specified, the interface will not run. If running only one copy of the interface, and a number was not appended to apionline, /RN=-1 should be used (any negative number will do).

/CM=# Specifies the Cluster Mode, whether the interface has a bias toward running on the primary node (/CM=0) or no bias (/CM=1).

/CN=tagname Specifies a PI string tag which receives the name of the node which is currently active. (Optional)

/FM=# Failover Mode. Selects behavior for the backup interface. Chilly (/FM=1), Cool (/FM=2) or Warm (/FM=3).

For Cluster Mode 1, it does not matter which node has /PR=1 and which has /PR=2, but one of each is necessary.

For a quick example, using Cluster Mode 0, with the primary node as Wallace, and the backup node as Grommit.

One Interface

If using the default directory for installation, the interface is called OPCInt.

The following files should be found in the opcint directory: apionline.exe, apionline.bat, opcint.exe, opcint.bat

Group: OPC Group

Resource: Apionline, with Service Name “apionline” and Start Parameter “/proc=opcint”

apionline.bat :

apionline.exe /proc=opcint

opcint.bat on Wallace, the primary node:

opcint /ps=o /ec=10 /er=00:00:03 /id=1 /df=OPCDBGF ^

/TF=”ccyy/mn/dd hh:mm:ss.000” /SERVER=OSI.HDA.1 /host=widget:5450 ^

/MA=Y /ts=Y /opcstopstat /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :02 ^

/CM=0 /PR=1 /RN=-1 /FM=3

opcint.bat on Grommit, the backup node:

opcint /ps=o /ec=10 /er=00:00:03 /id=1 /df=OPCDBGF ^

/TF=”ccyy/mn/dd hh:mm:ss.000” /SERVER=OSI.HDA.1 /host=widget:5450 ^

/MA=Y /ts=Y /opcstopstat /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :02 ^

/CM=0 /PR=2 /RN=-1 /FM=3

Note that only the last line of opcint.bat in this example is specifically for the cluster configuration, and the two opcint.bat files are the same except for the /PR=# parameter.

Three Interfaces

For each interface, make sure there are an apionline.exe, apionline.bat, the resource and group, opcint.exe and opcint.bat.

For interface 1, the files are apionline1.exe, apionline1.bat,opcint1.exe,opcint1.bat

Group : OPC Group1

Resource : Apionline1, with Service Name “apionline1” and Start Parameter “/proc=opcint1”

apionline1.bat :

apionline1.exe /proc=opcint1

opcint1.bat on Wallace, the primary node:

opcint1 /ps=o /ec=10 /er=00:00:03 /id=1 /df=OPCDBGF ^

/TF=”ccyy/mn/dd hh:mm:ss.000” /SERVER=OSI.HDA.1 /host=widget:5450 ^

/MA=Y /ts=Y /opcstopstat /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :02 ^

/CM=0 /PR=1 /RN=1 /FM=3

opcint1.bat on Grommit, the backup node:

opcint1 /ps=o /ec=10 /er=00:00:03 /id=1 /df=OPCDBGF ^

/TF=”ccyy/mn/dd hh:mm:ss.000” /SERVER=OSI.HDA.1 /host=widget:5450 ^

/MA=Y /ts=Y /opcstopstat /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :02 ^

/CM=0 /PR=2 /RN=1 /FM=3

Note that only the last line of opcint1.bat in this example is specifically for the cluster configuration, and the two opcint1.bat files are the same except for the /PR=# parameter.

Next, for interface 2,

Group : OPC Group2

Resource : Apionline2, with Service Name “apionline2” and Start Parameter “/proc=opcint2”

apionline2.bat :

apionline2.exe /proc=opcint2

opcint2.bat on Wallace, the primary node:

opcint2 /ps=o /ec=10 /er=00:00:03 /id=2 /df=OPCDBGF ^

/TF=”ccyy/mn/dd hh:mm:ss.000” /SERVER=OSI.HDA.1 /host=widget:5450 ^

/MA=Y /ts=Y /opcstopstat /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :02 ^

/CM=0 /PR=1 /RN=2 /FM=3

opcint2.bat on Grommit, the backup node:

opcint2 /ps=o /ec=10 /er=00:00:03 /id=2 /df=OPCDBGF ^

/TF=”ccyy/mn/dd hh:mm:ss.000” /SERVER=OSI.HDA.1 /host=widget:5450 ^

/MA=Y /ts=Y /opcstopstat /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :02 ^

/CM=0 /PR=2 /RN=2 /FM=3

Note that the /ID was changed to match the resource number /RN.

Finally, for interface 3,

Group : OPC Group3

Resource : Apionline3, with Service Name “apionline3” and Start Parameter “/proc=opcint3”

apionline3.bat :

apionline3.exe /proc=opcint3

opcint3.bat on Wallace, the primary node:

opcint3 /ps=o /ec=10 /er=00:00:03 /id=3 /df=OPCDBGF ^

/TF=”ccyy/mn/dd hh:mm:ss.000” /SERVER=OSI.HDA.1 /host=widget:5450 ^

/MA=Y /ts=Y /opcstopstat /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :02 ^

/CM=0 /PR=1 /RN=3 /FM=3

opcint3.bat on Grommit, the backup node:

opcint3 /ps=o /ec=10 /er=00:00:03 /id=3 /df=OPCDBGF ^

/TF=”ccyy/mn/dd hh:mm:ss.000” /SERVER=OSI.HDA.1 /host=widget:5450 ^

/MA=Y /ts=Y /opcstopstat /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :01 /f=00 :00 :02 ^

/CM=0 /PR=2 /RN=3 /FM=3

Buffering Data on Cluster Nodes

Buffering is fully supported on cluster nodes. In order to take advantage of buffering, bufserv.exe should be installed on all participating cluster nodes at the time of PI API installation. No special configurations are required to enable buffering on a cluster node. It should be noted that there is a risk of incurring a substantial amount of out-of-order data in the scenario where a failover occurs at a time when both interfaces are disconnected from PI (thus buffering data). Upon reconnection each cluster node will send buffered data simultaneously, which will result in out-of-order data. This will cause the PI server to increase resource consumption, particularly the PI Archive Sub-system, as it attempts to process these out-of-order events. For a complete discussion on how to configure buffering, see the section Buffering Data on Cluster Nodes.

Group and Resource Creation Using Cluster Administrator

Before this step, make sure that MSCS is installed and configured. Test and verify that Clustering is functioning correctly prior to creating groups and resources for OPCInt interface failover. At the end of this section are steps for verifying correct cluster configuration. Directions specified here are for using the resources with Cluster Mode 0, with the assumption being that if using Cluster Mode 1, the installer has enough knowledge to decide the proper settings for the configuration. Cluster Mode 1 allows much more control over the cluster resource, but the possibilities and considerations for cluster control are beyond the scope of this document.

Cluster Group Configuration

Note: Interfaces must not be run under the Local System account if using Cluster Failover. The service must be set up to run under an account that has administrator privileges.

Installation of Cluster Group

From the desktop, click on Start/Programs/Administrative Tools (Common)/Cluster Administrator. Click on File /New/Group. Enter the name of the group and description.

[pic]

Click Next. Do not add anything to the Preferred owner’s box, since owner preference is built into the interface for Cluster Mode 0. Below, Grommit and Wallace are the cluster nodes.

[pic]

Click Finish.

Right click on the group just created and select Properties. Fill out the name of the cluster and the description. Do not select the Preferred owners since these are the nodes on which the group is preferred to run. Preferred ownership is built into the interface when using Cluster Mode 0, and therefore should not be set using from the Cluster Administrator.

[pic]

Set the Threshold and Period as follows. Threshold is the maximum number of times which a group is allowed to fail over in the time specified by Period.

[pic]

For the Failback tab, select Prevent failback, since failback mechanism is also built into the interface when using Cluster Mode 0.

[pic]

Click on Apply and then OK.

Installation of the Resources

Right click on the group in Cluster Administrator, select New and then Resource. Type the name of the resource, and description. Select the Resource type Generic Service.

[pic]

Running this resource in a separate Resource Monitor is not necessary, unless this resource seems to be causing a lot of problems or doing so to try and isolate a problem.

Click on Next and verify that the cluster nodes are in Possible owners list. These are the nodes on which the resource can run, and therefore the nodes onto which the group can fail over.

Click Next and skip Dependencies. Move on to Generic Service Parameters.

[pic]

This resource (in the example above, it is called apionline1) should have been installed as a service prior to cluster resource creation by typing in Windows Command line window,

Apionline1 –install –depend tcpip

/proc is a parameter that apionline resource needs. It identifies the process that apionline must have running before apionline itself can be brought on line. This parameter should be the name of the opcint service for which this resource is being defined. Click on Next and skip Registry Replication. Click on Apply and OK.

Right click on the resource and then select Properties. In Advanced tab, set the entries as below. Note that MSCS is being instructed to restart the resource, but to fail it over to the other node every time. This means that when apionline shuts itself down because its interface is not running, or if the primary interface (running in Cluster Mode 0, with a bias toward the primary node) shuts down apionline on the backup node, MSCS will first move ownership of the resource to the other node before restarting it.

Click Apply and then OK.

Repeat the group and resource creation process for each instance of the interface on the node. Next the interface is ready to be configured.

Logfile Messages for Interface-Level Failover

The messages printed to pipc.log will vary somewhat depending on what the cluster does. In general, any time the interface detects that something has changed (ownership of a resource has shifted over to another node), it will print a message.

When the interface first connects to the cluster, it will check for the cluster resource to be online. If it has trouble connecting, or if the resource is offline, some of these messages may be seen:

Failed to open cluster: error %d. Will try again in 5 seconds.

Failed to open cluster resource %s: error %d. Will try again in 5 seconds.

Cluster resource not online, state %d, waiting

It will keep trying until it succeeds. The “Failed to open” messages will repeat, since if it fails to open the cluster, there probably is a problem with the cluster and the interface should not sit here and silently wait forever. If the resource is not online, it will just silently and wait forever – it is assumed that the intent was for the resource to be offline.

Once it is connected, one of these messages should be seen

Cluster Resource %s on this node

Cluster Resource %s NOT on this node

After that, one of these messages might be seen

Failed to get group handle: error %d. Will try again in 5 seconds.

Again, that message will repeat if the condition persists, since the configuration is still messed up.

Finally, the last possibility for failure before going into steady-state mode would be the message:

Error creating cluster port: %lX. Failover disabled.

That one is fatal, as far as failover is concerned. If this message is received OSI TechSupport should be called.

Once everything is running properly the following messages will be printed as the cluster resource moves from one node to the other.

Resource now running on this node

Resource no longer running on this node

Group owned by this node

Group NOT owned by this node

If more messages are required, set the /DB=128 using a debug flag tag that is defined with /DF (see the OPC Interface Manual for more on that). This setting will cause the interface to print out more messages when something happens, for instance if the resource state changes but the owner does not change.

Using Combination of Server- and Interface-Level Failover

The interface can be configured using combination of server-level and interface-level failover. Only one type of interface-lever failover is allowed to be combined with the server-level failover per configuration. If the interface-level failover based on UniInt is used, the Microsoft clustering based failover will be disabled automatically.

The easiest way to configure interfaces for both kinds of failover is to configure them for interface-level failover first, and verify that it is working properly. Next, remove those parameters from the opcint.bat files, and configure and test each interface server-level failover separately. Once the server-level failover working properly, add the interface-level failover parameters back into the opcint.bat files.

Note: If the server-level failover is combined with the interface-level failover based on UniInt, the control points required for this type of failover should be created on the underlying data source (i.e. DCS, PLC, and other). It is important that both primary and backup OPC Servers share the same control points. This will prevent faulty behavior of the interfaces when the primary OPC Server becomes unavailable.

Revision History

|Date |Author |Comments |

|4-Dec-2002 |LACraven |Created using skeleton 1.11 |

|28-Apr-2003 |LACraven |Added multiple watchdog tags |

|1-Nov-2005 |Janelle |Version 2.2.2.0, Rev: A: fixed headers and footers, fixed |

| | |copyright, fixed section breaks |

|13-Apr-2006 |Janelle |Version 2.2.3.0, Updated manual to include latest version, |

| | |fixed section breaks, removed tracking, fixed headers and |

| | |footers, and updated Table of Contents; removed Bookmarks. |

|12-Jun-2006 |Amaksumov |Version 2.3.1.0, Updated manual to include UniInt Failover |

| | |section. Changes in the document structure. Some changes to |

| | |the context. Updated hardware connection diagrams. |

|15-Jun-2006 |MKelly |Version 2.3.1.0, Rev A, Removed all first person references. |

| | |Updated the TOC, fixed headers and footers, fixed section |

| | |breaks. |

|26-June-2006 |Amaksumov |Version 2.3.2.0, Added section – Command Line Parameter |

| | |Considerations. |

|28-Jun-2006 |MKelly |Made some minor grammatical changes, fixed headers. |

|19-Jul-2006 |Amaksumov |Version 2.3.2.0, Rev A, Removed all references to |

| | |/UFO_Interval parameter for UniInt based Interface-Level |

| | |Failover |

|28-Jul-2006 |Amaksumov |Added additional tag attributes (excmax, excmin, excdev, |

| | |excdevpercent, compressing) that are required for |

| | |configuration of the UniInt based failover control tags |

|27-Oct-2006 |Amaksumov |Version 2.3.3.0, Changed the version number. |

|28-Oct-2006 |MKelly |Version 2.3.3.0, Rev A; Fixed page setup margins, tab in |

| | |headers, made tables fit within margins. |

|14-Nov-2006 |MGrace |PLI# 9722OSI8 Add section about Buffering in the Cluster |

| | |Failover section. |

|29-Nov-2006 |MGrace |Fix /wq=x descriptions. |

|14-Feb-2007 |MGrace |Version 2.3.4.0 |

|19-Feb-2007 |MKelly |Version 2.3.4.0, Rev B; Added new ICU screenshots showing PI |

| | |ICU 1.4.1.0 (PR1) layouts. |

|17-Apr-2007 |MGrace |Updated the version to 2.3.5.0 |

|20-Jun-2008 |LCraven |Changed name to OPC DA Interface Failover Manual; updated |

| | |version to 2.3.8.0 |

|19-Sep-2008 |LCraven |Updated the version to 2.3.9.0 |

-----------------------

Am I active ?

Can I run ?

opcint

apionline

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download