BMC TrueSight Operations Management high availability deployment

High availability (HA) is a redundancy operation that automatically switches to a standby server if the primary server fails or is temporarily shut down for maintenance. HA enables BMC TrueSight Operations Management to continuously function so that monitoring of your mission-critical systems is continuously available. Operations Management supports out-of-the box HA that eliminates the need for third-party software and reduces the manual steps required to deploy. HA utilizes a load balancer software component as proxy server to switch operations between the primary and standby server.

Operations Management in HA mode

Operations Management in HA mode consists of two servers with identical configurations. The first server is referred to as the primary server, and the standby server is referred to as the secondary server. The primary server always takes on an active role and all the Operations Management processes that are running during this time. When the primary server is down or in case of a failover, the secondary server takes on an active role. Only one Operations Management server can be active at a time. In case of a failover due to an event that triggers a server shutdown, the secondary server takes over the active role, and all of the processes change from standby mode to operation mode on the secondary server.

The detection and management of a failover is built in to the Presentation Server. However, it does not manage the failback transfer back to the primary server. You must issue CLI commands to restart the primary server and re-establish its role as the active server.

HA architecture

An Operations Management HA deployment comprises three systems:

Primary server
Secondary server
Load balancer

A load balancer is a software component that routes the client requests to the active server. In the context of the Operations Management system, the load balancer is works as a proxy server that accepts client requests and directs these requests to the active server. The load balancer resides on a separate computer and redirects requests to the active server.

Why is a load balancer required?

In a successful HA deployment, the secondary server must take over when the primary server is not working, or the primary server is ready to take over a load balancer is required to direct the client requests to the active server.

A load balancer:

Gives a unified contact point to access the Presentation Server
Enables to detect the active node, primary or secondary server, automatically in an HA deployment.

If you choose to use an Nginx server as the load balancer between the primary and secondary server, you can use the attached nginx.conf file as a server configuration example.

For HA deployment testing, BMC developers used an Nginx server as the load balancer.

HA deployment options

There are two ways to deploy Operations Management in HA mode:

Deploying in HA mode during the installation process
Deploying in HA mode post-installation

Deploying in HA mode during installation

You can choose to deploy Operations Management in HA mode during installation by selecting the Enabled option, If you choose to enable HA, you are required to specify which system is the primary server and which system is the secondary/standby server. For information about deploying in HA mode during installation, see Performing-the-Presentation-Server-installation.

Deploying in HA mode post installation

You can deploy the HA mode post installation by configuring the primary and standby servers for HA operation.

Note

On Linux computers, add & at the end of the tssh server start and tssh server stop commands so that the process runs in the background and you can continue to use the shell.

Configuring primary node

Open a CLI command prompt, and from the bin directory where the Presentation Server is installed, perform the following commands:

tssh server stop
tssh process start database
tssh ha configure master <Enter the HA primary and secondary server details.>
These details are used to generate the configuration information in the ha-shared.conf file.
tssh process stop database
tssh server start

Configuring standby node

Open a CLI command prompt, and from the bin directory where the Presentation Server is installed, perform the following commands:

tssh server stop
tssh ha configure standby <Enter the path to the ha-shared.conf file.>
tssh ha copysnapshot
tssh server start

Transferring the service between the secondary server and primary server

In Operations Management HA mode, the secondary server becomes the active server if the primary server stops operating, due to an event that triggers a server shutdown. Once the primary server is up and running, it does not become the active server by default. The primary server is still in a standby mode. The service can be transferred back to the primary server, or the primary server can remain in standby mode.

To transfer service to the primary server

Open a CLI command prompt, and from the bin directory where the Presentation Server is installed, perform the following steps to transfer control from the secondary server to the primary server:

tssh server stop
tssh ha copysnapshot
tssh server start
After the primary server is running, stop the secondary server
tssh server stop
Verify that the user can login to the primary server.
Start the secondary server:
tssh ha copysnapshot
tssh server start

To set the primary server in standby mode

Open a CLI command prompt, and from the bin directory where the Presentation Server is installed, perform the following steps to operate the primary server in standby mode:

tssh server stop
tssh ha copysnapshot
tssh server start

To set the secondary server in standby mode

Open a CLI command prompt, and from the bin directory where the Presentation Server is installed, perform the following steps to operate the secondary server in standby mode:

tssh server stop
tssh ha copysnapshot
tssh server start

Limitations of HA in Operations Management

If the host name of the Presentation Server is mapped to the loopback address in the etc/hosts file, HA will not work. Ensure that the hosts file is not updated with the loopback address.
The primary and secondary servers must be accessible (pingable) with a Fully Qualified Domain Name (FQDN) that has an IPv4 address.
If the primary and secondary servers are both disconnected from the network, you must restart both servers, and set one server to the active mode and the other server to standby mode.
During a Tomcat session, the user must login on failover.
The primary and secondary server should be located on the same platform and use the same operating system.