Cells troubleshooting


The following issues can occur when you are using cells.

Note

If you are experiencing issues with a cell, you can turn on cell tracing to help diagnose the problem. For instructions, see Configuring-cell-tracing.

1

The cell will not start

Explanation:
It is possible that a statbld failure has occurred.

User Response:
If there is nothing in the log trace files to help you diagnose the issue, try running the cell in the foreground. This frequently provides the information needed to correct the issue or enough information for BMC Support to diagnose the problem. To run the cell in the foreground, enter mcell -n cellName -d

If a statbld failure has occurred, to correct this problem, perform these steps:

  1. Look for the following files in the installationDirectory/pw/server/var/cellName directory:
    • mcdb.0
    • mcdb.lock
  2. If either or both of these files are present, delete them.
  3. Restart the cell.

2

BMC-IMC890104W:The cell is not responding to a client request

Explanation:
The client has made a request to the cell, but the cell is not responding.

User Response:
Check the connection status of the cell.

3

The primary and secondary servers for the high availability cell are in active mode simultaneously or are unsynchronized.

Explanation:
This can occur for one of the following reasons:

  • The primary and secondary servers are running on a network that has an unreliable connection.
  • When a high availability cell has started using any of the mcell -I initialization options (for example, -ia, -id, or other variants).
  • When the primary server is started first and terminates before the secondary server is started.

User Response:
Synchronize the mcdb and xact files of the primary and secondary servers. Perform the following steps to synchronize the primary and secondary servers:

  1. If the issue was caused by an unreliable network, resolve the network issue.
  2. Shut down both cell servers.
  3. Copy the mcdb and xact files of the preferred server to the other server. (The preferred server can be either primary or secondary.)
  4. Start the secondary cell server.
  5. Start the primary cell server.

4

In a high availability deployment, events are not synchronized after the failback to primary cell.

Explanation:

During failback, you need to ensure that the secondary cell is in Full Activity mode. Start the primary cell for failback only when the secondary cell is in Full Activity mode to ensure proper synchronization of events.

Workaround:

  1. Stop the primary and secondary cell.
  2. Manually copy the secondary mcdb file to the primary to get event and data of both cells in sync.
  3. Start the primary and secondary cell.

5

TrueSight Infrastructure Management cells experience a connection timeout, if a firewall is configured between them and if there is a low event flow.

Probable cause:

If the event flow between the TrueSight Infrastructure Management cells is significantly low, and there are stringent firewall rules configured, the cells experience a connection timeout, and the new events may not be propagated unless you restart the cells. 

Resolution:

Do the following to keep the connection alive between the cells despite the low event flow:

  1. Go to the <Installation Directory>\pw\server\etc\cellName directory.
  2. Using a text editor, open the mcell.conf file.
  3. Set the TCPKeepAlive parameter to Yes.
  4. Set the TCPKeepIdle to the required time in seconds.


      • Ideally, the TCPKeepIdle parameter must be set to a value lesser than the idle time in the firewall configuration settings.
      • The TCPKeepAlive and TCPKeepIdle parameters are supported only with the version 11.3.03 and later. For details, see Cell-configuration-parameters.

6

The imDbBackupDaily job which takes a backup of the cell mcdb failed with the following error message:

Self-Monitoring: Impact Manager daily database backup failed

Probable cause:

The imDbBackupDaily job could not launch the statbld process as there was one already running.

Resolution:

As a resolution to the issue, a retrial mechanism has been introduced so that the daily job performs a retry. Do the following to apply the retrial mechanism and complete the cell database backup:

  1. Go to the (Linux) <Installation Directory>/pw/custom/conf or (Windows) <Installation Directory>\pw\custom\conf directory.
  2. Using a text editor, open the pronet.conf file and add the following parameters. After adding the parameters, save the file.
    pronet.ngpagent.cell.imdbbackupdaily.retryCount=12
    pronet.ngpagent.cell.imdbbackupdaily.retryTime=10
  3. Restart the Infrastructure Management server to reflect the preceding properties.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*