Infrastructure Management Server high-availability architecture

A high-availability deployment of Infrastructure Management consists of two servers with identical configuration, one designated as primary and the other as secondary. At any point, one of the nodes is active and the other in standby mode. If the active node shuts down or otherwise becomes unavailable, the standby node takes over the active role. When you restore the primary server, failback occurs, and it becomes the active node.

The Infrastructure Management HA components automatically manage the synchronization between the active and standby nodes and automatically detect a failover situation. You need to configure a third-party load balancer in reverse proxy mode between the TrueSight Presentation Server and HA-enabled Infrastructure Management server.

An Infrastructure Management HA deployment comprises the following systems:

Primary server
Secondary server
Third-party load balancer

HA deployment for Infrastructure Management

In an HA deployment of Infrastructure Management, the load balancer is installed on a separate server and redirects requests to the active node. The load balancer provides a single point of access to the HA-enabled Infrastructure Management server.

Failover and failback

Failover occurs when the primary server becomes unavailable or when any of the critical server processes become unavailable. If a critical process is unavailable, recovery action (restart or shutdown) is performed. For database validation before failover, seeInfrastructure Management database and operating system issues.

There are two types of failback - automatic and manual. In automatic failback, which is the default behavior, the primary server becomes the active node upon server startup. For manual failback, you need to perform the following steps:

(Primary server only) Set the following property to false in the installedDirectory\pw\custom\conf\ha.conf file:
pronet.ha.auto.failback.enable=false
(Windows only) Change the startup type of the BMC TrueSight Infrastructure Management server service from Automatic to Manual.

During failover and failback, the Infrastructure Management server might get disconnected from the Presentation Server, until the standby node becomes active. During this time, you might not be able to perform any operations on the Infrastructure Management server.

Failover and failback time

The failover and failback time is the time taken by the server to become fully responsive and for the Infrastructure Management component status to be displayed as connected in the TrueSight console. The system determines whether or not a failover is required. The time shown is approximate and includes the high-availability checking process.
For example: A failover is not required when there is a brief network connectivity issue to an external database.

Failover time: 10 -15 min

Failback time: 20 - 30 min

The following diagrams illustrate the failover and failback (automatic and manual) behavior.

Infrastructure Management Server high-availability architecture

Failover and failback

Failover and automatic failback

Failover and manual failback

Related topics

On this page