Page tree

Unsupported content

 

This version of the documentation is no longer supported. However, the documentation is available for your convenience. You will not be able to leave comments.

Skip to end of metadata
Go to start of metadata

The following diagram illustrates the overall HA architecture to support fault tolerance for core BMC ProactiveNet 9.5 components.

HA per component is supported and configured as follows.

BMC ProactiveNet Server HA

HA for the BMC ProactiveNet Server is supported through operating system clustering. The two servers in the cluster must be configured with shared storage between the two nodes. See Installing BMC ProactiveNet in high availability mode. BMC recommends leveraging a high speed SAN for storage. In the preceding diagram, although the QA, test, development, and Central Server are not shown in operating systems clusters for HA, you can install them in a cluster. 

Data collection Integration Services HA

In BMC ProactiveNet 9.5, the Integration Service is stateless. This allows the BMC PATROL Agent to automatically send performance data and events to another Integration Service if the primary instance is not available. There is no concern for maintaining monitoring related configuration at the Integration Service instances because no such configuration exists. Additionally, there is no association between the Integration Service instances and specific PATROL Agents to be maintained or otherwise managed by administrators at the Integration Service nodes. 

BMC ProactiveNet 9.5 enables you to cluster Integration Service nodes. These Integration Service cluster configurations are simple software settings referenced in policies. The configuration settings for a cluster are stored as a cluster in Central Monitoring Administration.  The cluster configurations contain connectivity information in the form of PATROL Agent variables that instruct the agent(s) about how to connect to the first, second, third, and fourth Integration Service nodes that are grouped in the cluster. There is no in-built load balancing with these cluster configurations; however, all the Integration Service instances are active supporting active/active HA. 

You can include up to four Integration Service nodes in a single cluster. BMC recommends referencing clusters in staging policies only. 

PATROL Agents attempt to connect to the list of Integration Services in the cluster, in the order that the Integration Services are listed. When an agent loses connection to the first Integration Service instance, it automatically connects to the second instance in the list. When the first Integration Service is once again available for connection, the agent does not automatically connect back to the first instance. It remains connect to the instance it is currently and successfully connected to.

Multiple Integration Service instances can run behind a load balancer. This means that a third-party load balancer can be placed between PATROL Agents and the Integration Services to support full active/active HA fault tolerance and true load balancing of event and performance data across multiple Integration Service processes running on different hosts. Generally, in large environments, BMC recommends leveraging load balancers as a best practice. This is however a recommendation, not a requirement. It basically ensures that the Integration Service tier is not overloaded if/when there is an event storm or an interruption in communication between the agents and the Integration Service nodes causing a flood of cached data to be sent to the BMC ProactiveNet Server(s) through the Integration Service nodes. 

Staging Integration Service HA

The staging Integration Service in the diagram is not shown in a cluster, and it is not included in the cluster configuration within the product. However, you can configure staging Integration Service nodes for redundancy. You can do this by setting up multiple staging Integration Service nodes and designating their connectivity information in a comma separated list for the PATROL Agent Integration Service configuration variable. 

An agent installation package and/or a single policy must never contain configuration for multiple staging Integration Service nodes that are associated with different BMC ProactiveNet Servers.

Event management cells HA

HA for the event management cells is provided through an in-built primary/secondary configuration as an active and hot standby cell pair. Event sources such as Integration Services are configured to send events first to the primary cell. If the primary cell is not available, the event source sends events to the secondary cell. The cells automatically synchronize live event data so that events are kept in synch between the two cells. The secondary cell is configured and operates as a “hot standby” cell. The primary and secondary cells monitor each other. During a failover, the secondary cell detects that the primary cell is not available and it takes over the event processing functionality. When the secondary cell detects that the primary cell has become available, it synchronizes events with the primary cell and switches back to standby mode. The primary cell then continues the event processing and synchronization with the secondary cell.

Event management cell high availability best practices

The following points are best practices regarding event management cell HA:

  • The primary and secondary cells must be set up with the same knowledge base configuration. This requirement is critical. The synchronization process only synchronizes event data and dynamic data in data classes that is stored in BMC Atrium CMDB.  This synchronization includes updates and dynamic data in the custom data classes you create. Events, dynamic data, and any updates and/or deletions to either are synchronized both ways. It does not synchronize configuration data in the knowledge base flat files. 
  • The synchronization of knowledge base configuration flat files must be manually managed or automated with custom scripts or other methods.
  • Never set up event propagation so that events only propagate to the primary or secondary cell. Always leverage multiple host definition (primary/secondary) for the destination configuration of the HA cell pair in the mcell.dir configuration files.
  • Use the same cell name for the primary and secondary cells.
  • If cell HA is monitored to detect conditions in which the primary and secondary cells are no longer synchronized, automatic failback is fine. However, if it is not monitored, then auto switchback must be disabled. Synchronization failures can cause the primary and secondary to get massively out of sync. This creates a scenario where you may failover to a cell that is out of sync causing data tables and events information to be out of date. Failback results in that out of sync state being written back to the primary causing an irretrievable loss of information. Additionally, when the scenario occurs where the network connection is lost between the primary and secondary cells, both may decide that the other host is down and become the primary cell. This can also lead to inconsistent data and events. It is a best practice to avoid the use of automatic switchback except under circumstances previously indicated.

PATROL Agent HA

PATROL Agents that run on the managed node that they monitor, in general, do not require HA. However, PATROL Agents that monitor large domain sources such as VMware vSphere or remote operating system monitoring require HA configurations in most environments. HA for the PATROL Agent is supported with operating system clustering or other third-party solutions such as VMware HA. 

Sybase HA

You can use BMC ProactiveNet in two database environments. You can either leverage the Sybase database that is delivered with the product or you can use your own Oracle database. The Sybase database is embedded and installed with the BMC ProactiveNet Server. If you use the out-of the-box embedded Sybase database, HA for the database is supported as part of the file system replication on a shared storage disk for the BMC ProactiveNet Server. For more information, see Installing the BMC ProactiveNet Server in HA mode on Windows.     

Oracle HA

HA for the Oracle database is supported thorough a third-party database availability management solution. It is best supported using Oracle RAC. For more information, see Installing the BMC ProactiveNet Server on Microsoft Windows with Oracle as database and the Oracle database documentation at www.oracle.com.

Related topics

Installing the BMC ProactiveNet Server in HA mode on Windows

Configuring and using BMC ProactiveNet in high availability mode

Installing the BMC ProactiveNet Server on Microsoft Windows with Sybase as database

Installing the BMC ProactiveNet Server on Microsoft Windows with Oracle as database

  • No labels