Support for clusters and failovers

This section describes how the PATROL Agent supports an application in a clustered environment and the type of failover tolerance it provides. This section contains the following topics:

Cluster and failover concepts

This table defines the common terms and concepts that are used in the description of the PATROL Agent's support for cluster environments. It also lists some examples of third-party clustering software.

Terms	Description
Control script	In a UNIX environment, a control script governs the movement of a package from one host to another. It literally controls the failover process.
Cluster	A cluster is a collection of two or more host computers that can connect and control common disk storage. Each cluster is controlled by a cluster management system (which can range from a control script on UNIX to a third-party application on Windows) that operates within the cluster.
Cluster node	Each host computer that shares physical disks or other storage devices and is registered with the cluster management software (or control script) is referred to as a cluster node.
Cluster application	A cluster application consists of all the necessary file structures that support the application. Cluster applications can run on any number of hosts that are logically grouped to form a cluster. They are commonly referred to as packages on UNIX systems and groups on Windows systems.
Cluster Management Software (CMS)	Each cluster is controlled by Cluster Management Software (CMS) operating within the cluster. Cluster Management Software controls the fail-over process. CMS ranges from third-party software applications to internally customized control scripts, and controls the failover process. When the cluster application on a cluster node quits servicing requests, the CMS node shuts down the cluster application and disconnects (Windows) or unmounts (UNIX) the physical disk storage that supports the cluster application. The CMS then designates another host to run the cluster application. The receiving host takes control of the physical disk storage and restarts the cluster application contained in the failover group (Windows) or package (UNIX).
Failover	Failover support is the ability to move a running cluster application from one cluster node to another with minimal data loss. The cause of the failover can range from scheduled maintenance to hardware or software failure.
Group	A group is a logical grouping of application files and resources in a Windows environment. It consists of all necessary file structures required to run a cluster application and make it available to the end user. In the DNS server or other host name/IP Address resolution table, you must assign a virtual IP Address to the group. The cluster management software identifies the group by its virtual IP Address.
Package	A package is a logical grouping of application files and resources in a UNIX environment. It consists of all necessary file structures required to run a cluster application and make it available to the end user. In the DNS server or other host name/IP Address resolution table, you must assign a virtual IP Address to the package. The control script identifies the package by its virtual IP Address.
Virtual IP address	Each cluster application (also referred to as a package on UNIX and a group on Windows) is assigned at least one IP address. This IP Address is used by the end-user front-end application for locating the application on the network. It is a virtual IP Address that is not associated with a physical location, but is associated with an application. The address is referred to as either "virtual" or "soft" because of its ability to be active on any host currently supporting the cluster application.

Failover tolerance

Failover tolerance is the ability to have an application that is running on one node in a cluster environment stop running on that node, and have another node take over the running of the application. An application may quit running on a node for any of the following reasons:

Application failure
Hardware resource failure
Software resource failure
Load balancing software moves application
Administrator moves application

The PATROL Agent provides failover tolerance by performing the following tasks:

Monitoring all hosts in the cluster using the same configuration information
Recording application history in the same history database for all hosts in the cluster

The PATROL Agent's failover tolerance prevents irregularities and gaps from occurring in the agent's history files for the cluster application. You must create and set the value of these environment variables. When creating and writing to history files, the PATROL Agent searches for information in these variables. It saves you from having to manually reconcile two history files because an application failed on one host and another host took over, creating a separate set of history files.

The PATROL Agent uses the following environment variables to provide failover support. This table is applicable for UNIX and Windows.

* To manage multiple PATROL Agents running on separate ports, append the port number to the variable name. This situation occurs when individual PATROL Agents are bound to individual applications such as Oracle, Exchange, Sybase, and so on. Each agent uses a separate port number.

The following example illustrates how the environment variables would be named for a host using port 8888. It also depicts the directory structure and file location.

Environment variables

PATROL_HISTORY=K:\doc\work\histdir
PATROL_VIRTUALNAME=AliasHostName
PATROL_CONFIG=K:\doc\work\config

Directory structure

For the values provided in the "Environment variables" section of this example, the PATROL Agent stores configuration information and records the history data in the following directory structure:

K:\doc\work\histdir\AliasHostName\8888\annotate.datK:\doc\work\histdir\AliasHostName\8888\param.hist
K:\doc\work\config\config_AliasHostName-8888

If these variables do not exist or they are empty, the PATROL Agent stores configuration information and records the history data in the following directory structure:

<PATROL_HOME>\log\history\HostName\8888\annotate.dat
<PATROL_HOME>\log\history\HostName\8888\param.hist
<PATROL_HOME>\config\config_HostName-8888

Operation of configuration and history environment variables

When searching for configuration information and creating and writing to the history database, the PATROL Agent uses the logic listed in the following table to check for the existence of PATROL cluster-specific variables.

Operation of configuration and history environment variables

Variable type	Exists?	Description
Virtual Name	Yes	If PATROL_VIRTUALNAME_8888 exists, the agent writes history using the virtual name as the host name. Using the virtual name provides continuous history for an application regardless of which host the application is running on. The agent also uses the virtual host name to identify the configuration file changes and the history database. Configuration file changes are written to<PATROL_HOME>\config\config_<virtualName>_<port>.cfg. The history database is written to the sub-directory structure history\<virtualName>\<port>, which is located in the directory pointed to by PATROL_HISTORY_<port>.
Virtual Name	No	The agent writes history using the actual host name. If the application fails over, the agent writes history using the new agent's name. Using the actual hostname creates gaps in the results of any dump_hist commands because the command does not recognize that the same application ran on different hosts.
Configuration File	Yes	If PATROL_CONFIG_8888 exists, then the agent reads configuration information from the location specified by this variable.
Configuration File	No	The agent reads from the default directory, <PATROL_HOME>\config\config_<virtualName> or <hostName>-<port>.
History Database	Yes	If PATROL_HISTORY_8888 exists, then the agent writes history to the location specified by this variable
History Database	No	The agent writes to the default directory, <PATROL_HOME>\log\history\<virtualName> or <hostName>\<port>\.

Examples of third-party cluster software

The following are the examples of the third-party clustering software:

MC ServiceGuard by Hewlett Packard
High Availability Cluster Multi-Processing for AIX by IBM
Sun Cluster for E6000 Servers by Sun
Windows Cluster Management Software