Troubleshooting PATROL for Linux KM


This section provides you information to troubleshoot the KM:

Error messages

This section describes some common error messages that you might encounter while running PATROL for Linux, and provide suggested resolutions for the errors. The messages in the KM use a prefix to identify which application or collector sent the message. Find the prefix you are interested in for a list of the messages associated with that application or collector.


Filesystems-related issues

Issue

Solution

Filesystem parameters still collect data even if the monitored filesystem has been unmounted from server 

Any of the following solution might resolve your issue:

  1. Check the statvfs permission as statvfs system API is used for filesystem data collection. The statvfs binary must be owned by root user and configure setuid by running the following commands:
    cd $PATROL_HOME/../unix/Linux-2-6-x86-64-nptl/bin
    ls -l statvfs
    -rwsr-xr-x 1 root patrol 9568 May 9 2011 statvfs
  2. This API does not have the capability of identifying if the provided path is a normal directory or there is any filesystem actually mounted on that path. If the any filesystem gets unmounted, the API returns the details for directory on which the filesystem was mounted. For example, if /sftp was earlier mounted on /home/sftp, after unmounting the filesystem, the API would return data for /home/sftp.
  3. PATROL Agent uses the "mount -v" command to get the information of mounted filesystems and update the monilist Ruleset.
    1. Delete the "monilist" configuration variable from PATROL Agent configuration value by:
    2. Go to PATROL Console and open "System Output Window"(SOW).
    3. In the prompt that opens, tpe the following command:
      OS> %PSL pconfig ("DELETE","/NUK/NUK_FileSystem_Container/moniList");
    4. Restart the PATROL Agent.
      This PSL command deletes the existing filesystem and allows PATROL Agent to discover the existing filesystems. It also helps to stop events that are generating on the FSMountStatus attribute. If you do not have PATROL Console, apply the following configuration variable and restart the PATROL Agent: "/NUK/NUK_FileSystem_Container/moniList" = { DELETE = "" }  

If the issue is not resolved, contact BMC Support with the following logs:

  • Patrol Agent configuration
  • Mount command output

Filesystem instance label name is incorrect 

If the FILESYSTEM instance label consists of more than 20 characters, the display name of the instance label is truncated by default.

The /NUK/NUK_FileSystem_Container/dispFullName pconfig variable enables you to display the complete FILESYSTEM instance label. You can assign the following values to this variable:

  • 0: Displays the truncated FILESYSTEM instance label if the characters are more than 20.
  • 1: Displays the complete FILESYSTEM instance label.

Example

/NUK/NUK_FileSystem_Container/dispFullName" = { REPLACE = "1" }

 If the issue is not resolved, contact BMC Support with the following logs:

  • Patrol Agent configuration
  • Mount command output

Filesystem exclusion does not work or how to disable alerts for unmounted filesystem

Check the filesystems type.

For example, the /run/user/* filesystems have type tmpfs.

To know the filesystem type, run the following command: mount | grep <filesystemName> (example - mount | grep /run/user/*)

When these filesystems get unmounted, mount command does not show entries for such file systems . Therefore, "type" of such instances is not available.

Check if the below rules are present. If these rules are not present, the exclusion is not successful:

"/ConfigData/NUK_FileSystem_Container/customFilterEnabled" = { REPLACE = "1" }, 

"/ConfigData/NUK_FileSystem_Container/customFsType" = { REPLACE= "tmpfs" },

To fix, see Configuring-FileSystems and restart PATROL Agent.

If the issue is not resolved, contact BMC Support with the following logs:

  • PATROL Agent configuration
  • Mount command output

How to enable default monitoring of all filesystems using Linux KM?

How to include specific file system monitoring from TrueSight Policy?

By default, only the following filesystems are monitored:

  • ^/$ (root)
  • ^/tmp$ (tmp)
  • ^/usr$ (usr)
  • ^/home$ (home)

To monitor all filesystems, add the following regular expression to the Include field “.*” while configuring the Filesystems monitor type. For more information, see Configuring-FileSystems.

Including all filesystems for monitoring puts unnecessary load on PATROL Agent, but you can add regular expressions to include filesystems to monitor.

You can include or exclude a filesystem from monitoring by adding comma-separated regular expressions in the Include and Exclude fields. For example, ^/scripts, ^/mnt, ^/local/utils.

 If the issue is not resolved, contact BMC Support with the following logs:

  • PATROL Agent configuration
  • Mount command output

Filesystem is present in df -h but is not monitored

Perform the following actions:

  1. Delete the content of the following pconfig variable - /NUK/NUK_FileSystem_Container/moniList
  2. Add the following to the pconfig variable - /NUK/NUK_FileSystem_Container/moniList= {REPLACE = “ “ }
  3. Reload the variable.
  4. Restart PATROL Agent.

How to exclude a specific filesystem monitoring from TrueSight policy?

If you are trying to exclude /run/user/* filesystems and still getting events, note that /run/user/* filesystems have type "tmpfs". When these filesystems get unmounted, mount command does not show entries for such file systems hence "type" of such instances is not available on server. As "type" is no more available, the following rules would not work:

 "/ConfigData/NUK_FileSystem_Container/customFilterEnabled" = { REPLACE = "1" }, 

"/ConfigData/NUK_FileSystem_Container/customFsType" = { REPLACE= "tmpfs" },

To fix this scenario, perform the following actions while configuring the FileSystems monitor type. For more information, see Configuring-FileSystems.

  1. In the Exclude by type section, select Custom and add tmpfs to the Custom type list field.
  2. In the Unmounted filesystems handling > Remove by type field, select Custom types and add tmpfs to the Custom types list text box.
  3. Restart the PATROL Agent. 

If the issue is not resolved, contact BMC Support with the following log: File system Debug (to check if the filesystem is discovered by the agent and KM).

How to enable default monitoring of all filesystems using Linux KM?

How to include specific file system monitoring from TrueSight Policy? 

By default, only the following filesystems are monitored:

  • ^/$ (root)
  • ^/tmp$ (tmp)
  • ^/usr$ (usr)
  • ^/home$ (home)

 To monitor all filesystems, add the following regular expression to the Include field “.*” while configuring the Filesystems monitor type. For more information, see Configuring-FileSystems.

Including all filesystems for monitoring puts unnecessary load on PATROL Agent, but you can add regular expressions to include filesystems to monitor.

You can include or exclude a filesystem from monitoring by adding comma-separated regular expressions in the Include and Exclude fields. For example, ^/scripts, ^/mnt, ^/local/utils.

 If the issue is not resolved, contact BMC Support with the following logs:

  • PATROL Agent configuration
  • Mount command output

Process-related issues

Issue

Solution

Process monitor unable to start a process (Process Monitoring for Linux) 

This issue has been fixed in PATROL for Linux version 1.2.00.04. Upgrade to the latest version of the KM to resolve the issue.

For more information, see Upgrading.

Debug

Issue

Solution

How to enable debug?

The Linux Debug monitor profile enables you to configure the KM debugging. You can enable debugging for various monitor types for a particular host. For more information, see Configuring-Linux-Debug-monitor-profile.

You can also perform the following actions and enable debug:

  1. In Truesight, open an Agent Query window for the PATROL Agent server for which you want to enable logs.
  2. Run the “%PSLPS” command.
    The PSL process list along with PID is displayed.
  3. In the output, search for the line like the following.
    67 NUK_FileSystem HALTED DISCOV NUK_FileSystem -
    Note: Ensure that you are observing the DISCOV line.
  4. Note the PID for that collector.
    In the example, PID is 67. On your environment it would be different.
  5. Run the following command on the "Agent Query":
    “%PSL trace_psl_process("PID", <pid number collected in #4 step>,-1);”
    Example: %PSL trace_psl_process("PID", 67,-1);
  6. Tracing for the filesystem starts.
  7. Restart the PATROL Agent.
  8. Wait for 5 minutes.
  9. Stop tracing using “%PSL trace_psl_process("PID", <pid number collected in #4 step>,0);” command on "Query Agent".
    Example: %PSL trace_psl_process("PID", 67,0);
    For debugging an issue for you, BMC Support might need the files from - /opt/bmc/Patrol3/log/trace/hostname/3181/, PATROL Agent configuration, Mount command output.

Remote monitoring-related issues

Issue

Solution

Remote monitoring prerequisites

Error found in the remote monitoring logs "nukremotexec.xpc: XPC error 32 -- Broken pipe" errors." 

Verify the following:

  • Credentials are valid and you are able to log in from the PATROL Agent server to the remote monitoring servers and remain connected.
  • SSH2 server is installed and running at the remote host.
  • You are able to nslookup.
  • Recursive listing of Patrol3 folder to view the permissions.
  • Verify the parameter in the sshd_config file:
    • UsePAM no
    • UsePAM yes
  • MaxSessions in the sshd_config file (by default, it is set to 10)

To resolve the error "nukremotexec.xpc: XPC error 32 -- Broken pipe" errors.", perform the following actions:

Set the SUID (Set owner User ID up on execution), change the permissions, and restart PATROL Agent. The permissions would be the following:

  • chown root:root nukserver.xpc (perform the change of ownership like this to all the NUK binaries)
  • chmod u+s nukserver.xpc
  • chmod u+s nukserver.xpc_64*
  • chmod 755 nukserver.xpc
  • chmod 755 nukremotexec.xpc
  • chmod 755 nukvm.xpc
  • chmod 755 nukremotexec.xpc

 If the issue is still unresolved, try the following workarounds:

Check for the logs in the following folder - /var/log/secure, whenever disconnect happens for this particular trace

Jul 6 18:47:13 vl-pun-bpm-qa49 sshd[11873]: fatal: mm_request_receive_expect: read: rtype 125 != type 115

 

You can take any of the following approach:

  • Perform the following actions:
    • Modify following config variable in "/etc/ssh/sshd_config" UsePrivillegeSeparation and set it as no (by default it is set as YES).
    • Summary of the variable - UsePrivilegeSeparation
    • Specify whether sshd(8) separates privileges by creating an unprivileged child process to deal with incoming network traffic. After successful authentication, another process is created that has the privilege of the authenticated user. The goal of privilege separation is to prevent privilege escalation by containing any corruption within the unprivileged processes. The default is ''yes''.
  • Modify following config variable in "/etc/ssh/sshd_config" as PasswordAuthentication yes

If the issue is not resolved, contact BMC Support with the following logs:

  • PATROL Agent configuration
  • Log complete folder


Data collection-related issues

Issue

Solution

Data is not collected for CPU and Memory

Higher CPU or Memory utilization by PATROL for Linux KM 

CPU Utilization

  • Check the OS support
  • Start the PATROL Agent on a different port.
  • Check the process consuming high CPU using Agent profiling
    $ ./PatrolAgent -profiling/tmp/agentprof -p <port>
  • Agent writes all profiling data after termination; hence stop PATROL Agent with pconfig +KILL OR kill -15 (SIGTERM). Kill -9 (SIGKILL) is never recommended.
  • Profiling output is a binary file; use the following command to get text output of Agent profiling.
    $ ppv /tmp/agentprof > /tmp/agent_profiling.txt
  • If profiling does not give a legitimate output, run PATROL Agent without any KMs and then load KMs one-by-one to check when CPU utilization rises to determine which KM is causing the issue. 

Memory Utilization

  • Check OS support.
  • Check the process consuming high memory.
  • Start the PATROL Agent on a different port.
  • Verify if any old configurations exist and follow the following actions:
    1. Stop the PATROL Agent.
    2. Take the backup of config and log folders.
    3. Purge the Agent.
    4. Start the PATROL Agent.
    5. Check the PATROL Agent CPU consumption.

 

 If the issue is not resolved, contact BMC Support with the following logs:

Here are the steps to obtain PATROL Agent debug logs:

  1. Stop the PATROL Agent.
  2. Start the PATROL Agent in debug mode by running the following command
    • Linux: ./PatrolAgent -debug ALL,file="/some_filesystem/PAdebug.txt",count="10000000"
    • Windows: double-click the PatrolAgent service, stop the service. When the service is stopped, enter the following line (with appropriate modifications to path, filename, and count, as needed) in the 'Start Parameters' field: -debug ALL,file=C:\\patrol_agent_debug_output.txt,count=250000000

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC PATROL for Linux 25.3