PATROL Agent disconnects intermittently or stops responding

PATROL Agent frequently stops responding due to insufficient ulimit settings

The PATROL Agent frequently stops responding.

Probable cause:

PATROL Agent stops responding because the system's ulimit settings are breached. This may happen because the PATROL Agent has been started using rc or inittab startup scripts. It may inherit the ulimit settings of the root account, which are most likely lower than that is needed by the PATROL Agent. 

Resolution:

Run the following command to verify the present ulimit values, and increase the ulimit values if required.

Ulimit – a

PATROL Agent consumes high CPU and stops responding

The PATROL Agent abruptly stops responding due to high CPU consumption.

Probable cause: A specific KM or a PATROL Agent process might be causing the high CPU utilization.

Resolution: As a workaround, do the following: 

  1. Stop the PATROL Agent.

  2. Do the following to ensure that ACTIVEPROCESS.km is not running:

    1. Add ACTIVEPROCESS.km to the agent's configuration database.

    2. Update the /AgentSetup/disableKMs variable to list the KMs that you want to disable. 

  3. Restart the PATROL Agent.

  4. Do the following to verify that the agent does not have a corrupted history database:

    1. Go to the $PATROL_HOME/log directory. 

    2. Using a text editor, open the PatrolAgent_hostname-portnumber.errs file.

    3. Search for the inconsistencies phrase in the PatrolAgent_hostname-portnumber.errs file. 

      • If you find this phrase in the file, it indicates that the history database is corrupted.

      • If you don't find this phrase in the file, then there is a possibility that the history database is not corrupted. 

    4. Do one of the following to fix the corrupted database:

      • Run the fix_hist utility.

      • Go to the $PATROL_HOME/log/history/<hostname>/<portnumber> directory, and delete the parm.histannotate, and .dir files. 

        NoteIf history files are deleted, then the agent creates new files after restarting.

  5. Do the following to verify the runqschedpolicy variable is set to 1 in the agent configuration database. 

    If this variable is set to any one of the following values, use the Agent Configuration utility to manually set the variable to 1.

    • 2 : Reschedule the process from time it started

    • 4 : Try to adjust for the minimum load every time

    • 8 : Insert a time delay of DELTA seconds between the PATROL Agent process executions

  6. Do the following to verify if there are any existing old configurations:
    1. Stop the PATROL Agent.
    2. Take the backup of config and log folders.
    3. Purge the agent.
    4. Start the PATROL Agent
    5. Check the PATROL Agent CPU consumption
  7. Run the following command to measure how many resources each PSL script consumes. This helps you to find out the scripts that have the largest usage cost, and also provides information as to where the cost is distributed within the PSL script:

    PatrolAgent -p <port> -profiling file_3181.ppv

PATROL Agent stops responding due to corrupt history files

The PATROL Agent abruptly stops responding or disconnects intermittently due to corrupt history files, and displays the following error message in the log files:

History was not closed with a proper agent termination after the above date which indicates corrupted history files.

Probable causes: The PATROL Agent history files get corrupted if the files are opened for writing, and the PATROL Agent stops responding due to one of the following reasons:

  • PATROL Agent abruptly shuts down
  • PATROL Agent gracefully shuts down, but the PATROL Agent process is still running
  • PATROL Agent process was killed using the kill -9 command

Resolution: As a workaround, do the following: 

  1. Stop the PATROL Agent.
  2. Try to fix the history files using the fix_hist utility present in the $PATROL_HOME/bin directory.
    Note: The utility scans the history files and synchronizes the database and its indexes.
  3. Delete the history files (annotate.dat, dir, and param.hist) located in the following directory:
    $PATROL_HOME/log/history/<hostname>/<portnumber>
    Note: Alternatively, you can rename the history files so that they can be accessed later if required using the dump_hist utility.
  4. Delete the PEM files (PEM_{host}_{agent port #}.log and .archive) located in the following directory:
    $PATROL_HOME/log
  5. Restart the PATROL Agent. 

    • If the PATROL Agent doesn't start successfully, delete the config_{host}-{port} file located in the PATROL_HOME/config directory.
    • If you delete the configuration file, ensure that you reconfigure the KM files, and reload the configuration files by running the following command:

      pconfig +Reload {cfg file}
Was this page helpful? Yes No Submitting... Thank you

Comments