The following diagram illustrates the overall HA architecture to support fault tolerance for core BMC ProactiveNet 9.5 components.
HA per component is supported and configured as follows.
HA for the BMC ProactiveNet Server is supported through operating system clustering. The two servers in the cluster must be configured with shared storage between the two nodes. See Installing BMC ProactiveNet in high availability mode. BMC recommends leveraging a high speed SAN for storage. In the preceding diagram, although the QA, test, development, and Central Server are not shown in operating systems clusters for HA, you can install them in a cluster.
In BMC ProactiveNet 9.5, the Integration Service is stateless. This allows the BMC PATROL Agent to automatically send performance data and events to another Integration Service if the primary instance is not available. There is no concern for maintaining monitoring related configuration at the Integration Service instances because no such configuration exists. Additionally, there is no association between the Integration Service instances and specific PATROL Agents to be maintained or otherwise managed by administrators at the Integration Service nodes.
BMC ProactiveNet 9.5 enables you to cluster Integration Service nodes. These Integration Service cluster configurations are simple software settings referenced in policies. The configuration settings for a cluster are stored as a cluster in Central Monitoring Administration. The cluster configurations contain connectivity information in the form of PATROL Agent variables that instruct the agent(s) about how to connect to the first, second, third, and fourth Integration Service nodes that are grouped in the cluster. There is no in-built load balancing with these cluster configurations; however, all the Integration Service instances are active supporting active/active HA.
You can include up to four Integration Service nodes in a single cluster. BMC recommends referencing clusters in staging policies only.
PATROL Agents attempt to connect to the list of Integration Services in the cluster, in the order that the Integration Services are listed. When an agent loses connection to the first Integration Service instance, it automatically connects to the second instance in the list. When the first Integration Service is once again available for connection, the agent does not automatically connect back to the first instance. It remains connect to the instance it is currently and successfully connected to.
Multiple Integration Service instances can run behind a load balancer. This means that a third-party load balancer can be placed between PATROL Agents and the Integration Services to support full active/active HA fault tolerance and true load balancing of event and performance data across multiple Integration Service processes running on different hosts. Generally, in large environments, BMC recommends leveraging load balancers as a best practice. This is however a recommendation, not a requirement. It basically ensures that the Integration Service tier is not overloaded if/when there is an event storm or an interruption in communication between the agents and the Integration Service nodes causing a flood of cached data to be sent to the BMC ProactiveNet Server(s) through the Integration Service nodes.
The staging Integration Service in the diagram is not shown in a cluster, and it is not included in the cluster configuration within the product. However, you can configure staging Integration Service nodes for redundancy. You can do this by setting up multiple staging Integration Service nodes and designating their connectivity information in a comma separated list for the PATROL Agent Integration Service configuration variable.
An agent installation package and/or a single policy must never contain configuration for multiple staging Integration Service nodes that are associated with different BMC ProactiveNet Servers.
HA for the event management cells is provided through an in-built primary/secondary configuration as an active and hot standby cell pair. Event sources such as Integration Services are configured to send events first to the primary cell. If the primary cell is not available, the event source sends events to the secondary cell. The cells automatically synchronize live event data so that events are kept in synch between the two cells. The secondary cell is configured and operates as a “hot standby” cell. The primary and secondary cells monitor each other. During a failover, the secondary cell detects that the primary cell is not available and it takes over the event processing functionality. When the secondary cell detects that the primary cell has become available, it synchronizes events with the primary cell and switches back to standby mode. The primary cell then continues the event processing and synchronization with the secondary cell.
The following points are best practices regarding event management cell HA:
PATROL Agents that run on the managed node that they monitor, in general, do not require HA. However, PATROL Agents that monitor large domain sources such as VMware vSphere or remote operating system monitoring require HA configurations in most environments. HA for the PATROL Agent is supported with operating system clustering or other third-party solutions such as VMware HA.
You can use BMC ProactiveNet in two database environments. You can either leverage the Sybase database that is delivered with the product or you can use your own Oracle database. The Sybase database is embedded and installed with the BMC ProactiveNet Server. If you use the out-of the-box embedded Sybase database, HA for the database is supported as part of the file system replication on a shared storage disk for the BMC ProactiveNet Server. For more information, see Installing the BMC ProactiveNet Server in HA mode on Windows.
HA for the Oracle database is supported thorough a third-party database availability management solution. It is best supported using Oracle RAC. For more information, see Installing the BMC ProactiveNet Server on Microsoft Windows with Oracle as database and the Oracle database documentation at www.oracle.com.