Support for clusters and failovers
This section describes how the PATROL Agent supports an application in a clustered environment and the type of failover tolerance it provides. This section contains the following topics:
Cluster and failover concepts
This table defines the common terms and concepts that are used in the description of the PATROL Agent's support for cluster environments. It also lists some examples of third-party clustering software.
|In a UNIX environment, a control script governs the movement of a package from one host to another. It literally controls the failover process.|
|Cluster||A cluster is a collection of two or more host computers that can connect and control common disk storage. Each cluster is controlled by a cluster management system (which can range from a control script on UNIX to a third-party application on Windows) that operates within the cluster.|
|Cluster node||Each host computer that shares physical disks or other storage devices and is registered with the cluster management software (or control script) is referred to as a cluster node.|
|Cluster application||A cluster application consists of all the necessary file structures that support the application. Cluster applications can run on any number of hosts that are logically grouped to form a cluster. They are commonly referred to as packages on UNIX systems and groups on Windows systems.|
|Cluster Management Software (CMS)|
Each cluster is controlled by Cluster Management Software (CMS) operating within the cluster. Cluster Management Software controls the fail-over process. CMS ranges from third-party software applications to internally customized control scripts, and controls the failover process.
When the cluster application on a cluster node quits servicing requests, the CMS node shuts down the cluster application and disconnects (Windows) or unmounts (UNIX) the physical disk storage that supports the cluster application. The CMS then designates another host to run the cluster application. The receiving host takes control of the physical disk storage and restarts the cluster application contained in the failover group (Windows) or package (UNIX).
|Failover||Failover support is the ability to move a running cluster application from one cluster node to another with minimal data loss. The cause of the failover can range from scheduled maintenance to hardware or software failure.|
|Group||A group is a logical grouping of application files and resources in a Windows environment. It consists of all necessary file structures required to run a cluster application and make it available to the end user. In the DNS server or other host name/IP Address resolution table, you must assign a virtual IP Address to the group. The cluster management software identifies the group by its virtual IP Address.|
|Package||A package is a logical grouping of application files and resources in a UNIX environment. It consists of all necessary file structures required to run a cluster application and make it available to the end user. In the DNS server or other host name/IP Address resolution table, you must assign a virtual IP Address to the package. The control script identifies the package by its virtual IP Address.|
|Virtual IP address||Each cluster application (also referred to as a package on UNIX and a group on Windows) is assigned at least one IP address. This IP Address is used by the end-user front-end application for locating the application on the network. It is a virtual IP Address that is not associated with a physical location, but is associated with an application. The address is referred to as either "virtual" or "soft" because of its ability to be active on any host currently supporting the cluster application.|
Failover tolerance is the ability to have an application that is running on one node in a cluster environment stop running on that node, and have another node take over the running of the application. An application may quit running on a node for any of the following reasons:
- Application failure
- Hardware resource failure
- Software resource failure
- Load balancing software moves application
- Administrator moves application
The PATROL Agent provides failover tolerance by performing the following tasks:
- Monitoring all hosts in the cluster using the same configuration information
- Recording application history in the same history database for all hosts in the cluster
The PATROL Agent's failover tolerance prevents irregularities and gaps from occurring in the agent's history files for the cluster application. You must create and set the value of these environment variables. When creating and writing to history files, the PATROL Agent searches for information in these variables. It saves you from having to manually reconcile two history files because an application failed on one host and another host took over, creating a separate set of history files.
The PATROL Agent uses the following environment variables to provide failover support. This table is applicable for UNIX and Windows.
|Location of the configuration files|
|Contains the fully qualified path to the configuration file stored on a drive shared to the cluster. If this variable is empty or doesn't exist, the agent stores the configuration file in (Windows) PATROL_HOME\config or (UNIX) PATROL_HOME\config.|
|Location of history files|
|PATROL_HISTORY_port*||Contains the fully qualified path to the history file stored on a drive shared to the cluster. If this variable is empty or doesn't exist, the agent writes the history files to (Windows) PATROL_HOME\log\history\<host>\<portNumber> or (UNIX) PATROL_HOME.|
|Alias for the host name|
Contains the virtual severname that is used by the PATROL Agent instead of the hostname to store the PATROL configuration and historical data. If this variable is empty or doesn't exist, the agent uses the host name to identify history data within the history files.
Contains the location of the PATROL Agent error log file.
* To manage multiple PATROL Agents running on separate ports, append the port number to the variable name. This situation occurs when individual PATROL Agents are bound to individual applications such as Oracle, Exchange, Sybase, and so on. Each agent uses a separate port number.
The following example illustrates how the environment variables would be named for a host using port 8888. It also depicts the directory structure and file location.
For the values provided in the "Environment variables" section of this example, the PATROL Agent stores configuration information and records the history data in the following directory structure:
If these variables do not exist or they are empty, the PATROL Agent stores configuration information and records the history data in the following directory structure:
Operation of configuration and history environment variables
When searching for configuration information and creating and writing to the history database, the PATROL Agent uses the logic listed in the following table to check for the existence of PATROL cluster-specific variables.
Operation of configuration and history environment variables
|Virtual Name||Yes||If PATROL_VIRTUALNAME_8888 exists, the agent writes history using the virtual name as the host name. Using the virtual name provides continuous history for an application regardless of which host the application is running on. |
The agent also uses the virtual host name to identify the configuration file changes and the history database. Configuration file changes are written to<PATROL_HOME>\config\config_<virtualName>_<port>.cfg. The history database is written to the sub-directory structure history\<virtualName>\<port>, which is located in the directory pointed to by PATROL_HISTORY_<port>.
|No||The agent writes history using the actual host name. If the application fails over, the agent writes history using the new agent's name. Using the actual hostname creates gaps in the results of any dump_hist commands because the command does not recognize that the same application ran on different hosts.|
|Configuration File||Yes||If PATROL_CONFIG_8888 exists, then the agent reads configuration information from the location specified by this variable.|
|No||The agent reads from the default directory, <PATROL_HOME>\config\config_<virtualName> or <hostName>-<port>.|
|History Database||Yes||If PATROL_HISTORY_8888 exists, then the agent writes history to the location specified by this variable|
|No||The agent writes to the default directory, <PATROL_HOME>\log\history\<virtualName> or <hostName>\<port>\.|
Examples of third-party cluster software
The following are the examples of the third-party clustering software:
- MC ServiceGuard by Hewlett Packard
- High Availability Cluster Multi-Processing for AIX by IBM
- Sun Cluster for E6000 Servers by Sun
- Windows Cluster Management Software