High-availability deployment and best practices for Infrastructure Management
Consult the following topics for information and recommendations on how to deploy and configure TrueSight Infrastructure Management components to achieve high availability (HA):
The following diagram depicts the application-level HA implementation.
TrueSight Infrastructure Management Server HA
HA for the TrueSight TrueSight Infrastructure Management Server is supported through application-level HA and also through operating system clustering.
You can configure application-level HA only if you are using an Oracle database. For more information, see Considerations for a high-availability deployment of Infrastructure Management.
Operating system clustering
You can configure operating system clustering if you are using the embedded SAP SQL Anywhere database. The two servers in the cluster must be configured with shared storage between the two nodes. See Installing the Infrastructure Management server in the high-availability cluster mode using the wizard. BMC recommends leveraging a high speed SAN for storage.
Data collection Integration Services HA
The Integration Service is stateless, which allows the PATROL Agent to automatically send performance data and events to another Integration Service if the primary instance is not available. There is no concern for maintaining monitoring-related configuration at the Integration Service instances because no such configuration exists. Additionally, there is no association between the Integration Service instances and specific PATROL Agents to be maintained or otherwise managed by administrators at the Integration Service nodes.
TrueSight Infrastructure Management enables you to cluster Integration Service nodes. These Integration Service cluster configurations are simple software settings referenced in policies. The configuration settings for a cluster are stored as a cluster in Central Monitoring Administration. The cluster configurations contain connectivity information in the form of PATROL Agentvariables that instruct the agent(s) about how to connect to the first, second, third, and fourth Integration Service nodes that are grouped in the cluster. There is no in-built load balancing with these cluster configurations; however, all the Integration Service instances support active/active HA.
You can include up to four Integration Service nodes in a single cluster. BMC recommends referencing clusters in staging policies only.
PATROL Agents attempt to connect to the list of Integration Services in the cluster, in the order that the Integration Services are listed. When an agent loses connection to the first Integration Service instance, it automatically connects to the second instance in the list. When the first Integration Service is once again available for connection, the agent does not automatically connected back to the first instance. It remains connect to the instance it is currently and successfully connected to.
Multiple Integration Service instances can run behind a load balancer. This means that a third-party load balancer can be placed between PATROL Agents and the Integration Services to support full active/active HA fault tolerance and true load balancing of event and performance data across multiple Integration Service processes running on different hosts. Generally, in large environments, BMC recommends leveraging load balancers as a best practice. This is, however, a recommendation, not a requirement. It basically ensures that the Integration Service tier is not overloaded if or when there is an event storm or an interruption in communication between the agents and the Integration Service nodes causing a flood of cached data to be sent to the TrueSight Infrastructure Management Servers through the Integration Service nodes.
Consider high availability (HA) as part of the Integration Service node deployment.
If you plan to deploy the Integration Service on a VMware virtual machine (VM), you can utilize VMware HA. Utilizing VMware HA simplifies administration because it is transparent to the PATROL Agents (the connections for both the performance metrics and events automatically reconnect when the VM is restarted). For further information, see Considerations for deploying Infrastructure Management on a VM.
Staging Integration Service HA
The staging Integration Service in the preceding diagram is not shown in a cluster, and it is not included in the cluster configuration within the product. However, you can configure staging Integration Service nodes for redundancy. You can do this by setting up multiple staging Integration Service nodes and designating their connectivity information in a comma-separated list for the PATROL Agent Integration Service configuration variable.
An agent installation package or a single policy must never contain configuration for multiple staging Integration Service nodes that are associated with different TrueSight Infrastructure Management Servers.
Event management cell HA
HA for the event management cells is provided through an built-in primary/secondary configuration as an active and hot standby cell pair. Event sources such as Integration Services are configured to send events first to the primary cell. If the primary cell is not available, the event source sends events to the secondary cell. The cells automatically synchronize live event data so that events are kept in synch between the two cells. The secondary cell is configured and operates as a “hot standby” cell.
The primary and secondary cells monitor each other. During a failover, the secondary cell detects that the primary cell is not available and it takes over the event processing functionality. When the secondary cell detects that the primary cell has become available, it synchronizes events with the primary cell and switches back to standby mode. The primary cell then continues the event processing and synchronization with the secondary cell.
PATROL Agent HA
PATROL Agents that run on the managed node that they monitor, in general, do not require HA. However, PATROL Agents that monitor large domain sources, such as VMware vSphere, or remote operating system monitoring require HA configurations in most environments. HA for the PATROL Agent is supported with operating system clustering or other third-party solutions such as VMware HA.
Consider whether high availability (HA) is needed for the PATROL Agents used for collection. If the agent is performing local collection on a host that provides some service, the agent might already be part of the host-level HA setup for the service or application on that host. However, if the PATROL Agent is performing remote collection, that agent must be configured for HA.
Tip: If the agent is running on a VMware virtual machine (VM), VMware HA is a recommended option.
SAP SQL Anywhere HA
The SAP SQL Anywhere database is embedded and installed with the TrueSight Infrastructure Management Server. If you use the out-of the-box embedded SAP SQL Anywhere database, HA for the database is supported as part of the file system replication on a shared storage disk for the TrueSight Infrastructure Management Server. For more information, see Installing the Infrastructure Management server in the high-availability cluster mode using the wizard.
HA for the Oracle database is supported thorough a third-party database availability management solution. It is best supported using Oracle RAC. For more information, see Installing the Infrastructure Management Server on Microsoft Windows with Oracle and the Oracle database documentation at www.oracle.com.