PATROL Agent disconnects intermittently or stops responding
PATROL Agent frequently stops responding due to insufficient ulimit settings
The PATROL Agent frequently stops responding.
Probable cause:
PATROL Agent stops responding because the system's ulimit settings are breached. This may happen because the PATROL Agent has been started using rc or inittab startup scripts. It may inherit the ulimit settings of the root account, which are most likely lower than that is needed by the PATROL Agent.
Resolution:
Run the following command to verify the present ulimit values, and increase the ulimit values if required.
PATROL Agent consumes high CPU and stops responding
The PATROL Agent abruptly stops responding due to high CPU consumption.
Probable cause: A specific KM or a PATROL Agent process might be causing the high CPU utilization.
Resolution: As a workaround, do the following:
- Stop the PATROL Agent.
- Do the following to ensure that ACTIVEPROCESS.km is not running:
- Add ACTIVEPROCESS.km to the agent's configuration database.
- Update the /AgentSetup/disableKMs variable to list the KMs that you want to disable.
- Restart the PATROL Agent.
- Do the following to verify that the agent does not have a corrupted history database:
- Go to the $PATROL_HOME/log directory.
- Using a text editor, open the PatrolAgent_hostname-portnumber.errs file.
- Search for the inconsistencies phrase in the PatrolAgent_hostname-portnumber.errs file.
- If you find this phrase in the file, it indicates that the history database is corrupted.
- If you don't find this phrase in the file, then there is a possibility that the history database is not corrupted.
- Do one of the following to fix the corrupted database:
- Run the fix_hist utility.
Go to the $PATROL_HOME/log/history/<hostname>/<portnumber> directory, and delete the parm.hist, annotate, and .dir files.
Note: If history files are deleted, then the agent creates new files after restarting.
Do the following to verify the runqschedpolicy variable is set to 1 in the agent configuration database.
- Do the following to verify if there are any existing old configurations:
- Stop the PATROL Agent.
- Take the backup of config and log folders.
- Purge the agent.
- Start the PATROL Agent
- Check the PATROL Agent CPU consumption
Run the following command to measure how many resources each PSL script consumes. This helps you to find out the scripts that have the largest usage cost, and also provides information as to where the cost is distributed within the PSL script:
PatrolAgent -p <port> -profiling file_3181.ppv
PATROL Agent stops responding due to corrupt history files
The PATROL Agent abruptly stops responding or disconnects intermittently due to corrupt history files, and displays the following error message in the log files:
Probable causes: The PATROL Agent history files get corrupted if the files are opened for writing, and the PATROL Agent stops responding due to one of the following reasons:
- PATROL Agent abruptly shuts down
- PATROL Agent gracefully shuts down, but the PATROL Agent process is still running
- PATROL Agent process was killed using the kill -9 command
Resolution: As a workaround, do the following:
- Stop the PATROL Agent.
- Try to fix the history files using the fix_hist utility present in the $PATROL_HOME/bin directory.
Note: The utility scans the history files and synchronizes the database and its indexes. - Delete the history files (annotate.dat, dir, and param.hist) located in the following directory:
$PATROL_HOME/log/history/<hostname>/<portnumber>
Note: Alternatively, you can rename the history files so that they can be accessed later if required using the dump_hist utility. - Delete the PEM files (PEM_{host}_{agent port #}.log and .archive) located in the following directory:
$PATROL_HOME/log Restart the PATROL Agent.