Standby node in HA stops and does not restart

In a high-availability (HA) environment, an interruption in communication between the active and standby nodes (such as after a network connection interruption) might cause both nodes to become active, a situation which forces the later node to stop.

- Setting up an automatic restart of HA nodes in Red Hat Linux using Crontab
- Manually restarting the HA nodes on Windows and Linux

Setting up an automatic restart of HA nodes in Red Hat Linux using Crontab

Issue

In an HA environment for the App Visibility portal or collector running on a Red Hat Linux, restarting after an interruption in a network connection, the secondary node is shut down and is not automatically going to the standby mode.

Resolution

Perform the following steps to setup an automatic restart of HA nodes:

Download the tsav-check.shscript.
Copy the tsav-check.sh script to the portal and collector bin directories:
- <serverInstallationDirectory>/portal/bin
- <serverInstallationDirectory>/collector/bin
Provide the execution permission to the script by running the following command:
chmod +x tsav-check.sh
Setup a Crontab job.
Warning
Setup a Crontab job
Contact your Linux administrator for setting up a Crontab job.
Example
Warning
Note
The examples given here, when set up, runs the script once every minute.

Crontab at portal node:
* * * * * . /etc/bmc.profile && $ADOPSSERVER_HOME/portal/bin/tsav-check.sh
Crontab at collector node:
* * * * * . /etc/bmc.profile && $ADOPSSERVER_HOME/collector/bin/tsav-check.sh

Manually restarting the HA nodes on Windows and Linux

Issue

In an HA environment for the App Visibility portal or collector, the standby node becomes unavailable and after you restart the node, it stops right away.

Probable cause

An interruption in communication (such as an interruption in the network connection) between the active and standby nodes might cause both nodes to perform as if they are both active. When communication is reestablished, the system stops the later node—that is, the node which became active at a later time—and you receive an event about the standby node being unavailable.

If you restart the node, sometimes the system still recognizes it as an active node, and the system stops the service.

You can examine the component log file:

Windows
- serverInstallationDirectory\collector\logs\collector.log
- serverInstallationDirectory\portal\logs\portal.log
Linux
- serverInstallationDirectory/collector/logs/collector.log
- serverInstallationDirectory/portal/logs/portal.log

Look for the following message:

This component started as an active node instead of the standby node. Shutting down the serverType serverName service.

Resolution

To resolve this issue, restart the active node, and then start the standby node.

Standby node in HA stops and does not restart

Setting up an automatic restart of HA nodes in Red Hat Linux using Crontab

Issue

Resolution

Manually restarting the HA nodes on Windows and Linux

Issue

Probable cause

Resolution

TrueSight Application Management 11.3

On this page