Troubleshooting the RSCD Agent connectivity issues


This topic provides information about how to troubleshoot RSCD Agent connectivity issues. Troubleshooting these issues involves analyzing the error message types and their causes and remediating them.


Overview of the troubleshooting process

The following process flow diagram shows how to troubleshoot an RSCD Agent issue. Troubleshooting involves clarifications for issues related to:

  • Connectivity — Establishing communication from a server with NSH (or the Application Server) to a server with an RSCD Agent
  • Access control lists (ACLs) — Ensuring that the exports, users, and users.local files are correct on the target

To troubleshoot issues, you can start with the use of agentinfo command and analyze the rscd.log on the targets.

RSCD Agent Troubleshooting

Log locations and tools useful during troubleshooting

Task

Procedure

Notes

Locate log file location and configuration

The RSCD logs are typically located inside the installation directory. The live log file is called the rscd.log. The rolled log files are named as rscd.logN, where N is a number. On Windows, the additional rscdsvc.log file contains logs for the Windows service startup. The log file locations:

  • UNIX

/opt/bmc/bladelogic/RSCD/log

  • Windows

C:\Program Files\BMC Software\BladeLogic\RSCD

If the log files are not present at this location, inspect the log4crc.txt file that is present in the C:\Windows\rsc or the /etc/rsc directory, and search for the following line:

<category name="rscd" priority="info1" appender="/opt/bmc/bladelogic/NSH/log/rscd.log" debugappender="stderr"/>


Enable debug logging during RSCD Agent startup

Do the following:

  1. Edit the log4crc.txt file that is present in the C:\Windows\rsc or /etc/rsc directory and locate this line:
    <category name="rscd" priority="info1" appender="/opt/bmc/bladelogic/NSH/log/rscd.log" debugappender="stderr"/>
  2. Change the priority to debug.
  3. (For Windows RSCD Agents) Set the rollmaxfiles setting to 60. Windows RSCD agents need an additional change to retain more rolled log files. During startup, the service attempts to start the RSCD process 50 times. To capture the first restart, the rollmaxfiles setting change is required. Locate the following line:
    <appender name="C:/Program Files/BMC Software/BladeLogic/RSCD/rscd.log" type="digisign" rollsize="10000000" rolltimeinsec="2419200" rollmaxfiles="10" layout="dated" certfile="C:/W
    indows/rsc/certificate.pem" privatekeyfile="C:/Windows/rsc/certificate.pem"/>

    The line may look as follows when secure agent logging is not used:

    <appender name="C:/Program Files/BMC Software/BladeLogic/RSCD/rscd.log" type="rollfile" rollsize="10000000" rolltimeinsec="2419200" rollmaxfiles="10" layout="dated"/>

  4. Save the changes.
  5. Restart the RSCD Agent.

Enable debug logging during RSCD Agent operation

You can use the agentctl command to change the logging level of the running Agent to debug. A restart of the RSCD reverts to the logging specified in the log4crc.txt file.

You can run this command remotely or locally.

To run this command remotely:

nexec <targetName> agentctl toggle

To run this command locally:

  • UNIX
    /opt/bmc/bladelogic/RSCD/sbin/agentctl toggle
  • Windows
    C:\Program Files\BMC Software\BladeLogic\RSCD\agentctl toggle

Running the command again reverts to the previous logging level.


Determine whether you are using User Principal Mapping (UPM) or an Automation Principal (AP) to connect to the RSCD Agent

Run agentinfo <targetName> from the NSH configured to use an NSH Proxy and inspect the output:

UPM:

Look for PrivilegeMapped in the User Permissions line:

% agentinfo win19-2002.example.com
win19-2002:
 Agent Release : 20.02.00.31
 Hostname : WIN19-2002
 Operating System: WindowsNT 10.0 (x86_64)
User Permissions: BladeLogicRSCD@WIN19-2002->Administrator@WIN19-2002:PrivilegeMapped (Identity via trust)
 Security : Protocol=5, Encryption=TLSv1.2
 Host ID : 22CCB65F
 # of Processors : 1
 License Status : Licensed for NSH/CM

Look for PasswordLogon in the User Permissions line:

% agentinfo win19-2002.example.com
win19-2002.example.com:
 Agent Release : 20.02.00.31
 Hostname : WIN19-2002
 Operating System: WindowsNT 10.0 (x86_64)
User Permissions: TSSA@EXAMPLE:PasswordLogon (Identity via authentication)
 Security : Protocol=5, Encryption=TLSv1.2
 Host ID : 22CCB65F
 # of Processors : 1
 License Status : Licensed for NSH/CM


Troubleshooting Checklist

Use these steps to troubleshoot connectivity issues that you might be experiencing with RSCD Agents in your environment.

Task

Action

Steps

1

Define the scope of the problem.

  • Identify what is not able to connect to the RSCD Agent. For instance, all Application Servers, one Application Server, one client, or all clients.
  • Verify whether the issue is common for all RSCD Agents or specific Agents. For example, RSCD Agents in a certain location or the Agents running on targets with a specific operating system.
  • Verify whether the issue is intermittent.
  • Narrow down the problem as much as possible. For example, if the problem is intermittent and occurs on multiple Application Servers and Agents, focus on a single Application Server and a small number of Agents first.

2

Run agentinfo.

From the system with the connectivity problem, start nsh and run agentinfo <targetName>

Do you get a response ? An error message?

3

Compare the returned message to the messages in the next section for the possible resolution.

The table in the "Resolutions for common RSCD Agent error messages" section contains a list of common messages received from agentinfo and how to resolve connectivity problems or acl issues if an error message is returned.

4

If you do not receive one of the specified messages, start investigating the state of the RSCD on the target.

Connect to the target system via other means such as ssh or RDP as an administrative user.

5

Verify that the RSCD Agent is running.

After you have connected to the target system, check the agent processes:

UNIX:
Log in to the target server and view the process queue with the ps command, and check the process 'rscd'.

ps -ef | grep 'rscd'

Windows:
Start the Windows Task Manager or run

tasklist | findstr RSCD

and check the RSCDsvc.exe and RSCD.exe processes.

If the specified processes are not present, start the RSCD service.

  • On UNIX: Use the startup scripts (usually found in /etc/init.d).
  • On Windows: Use the service control manager and look for TrueSight Server Automation RSCD Agent. After starting the service, confirm that you see the processes.

If the Agent does not start, enable debug logging, start the Agent, collect the RSCD log files, open a case with BMC Support and include the log files.

6

Verify that the Agent is running on the expected port (typically 4750).

After confirming the agent has started, confirm that it is running on the expected port. The default port is 4750 for TCP. You can configure the port by updating the secure file (/etc/rsc/secure or C:\Windows\rsc\secure).

If the agent is listening correctly, the following output is expected:

  • UNIX:
    netstat -anp
    tcp        0      0 0.0.0.0:4750            0.0.0.0:*               LISTEN      1249/bin/rscd  
  • Windows:
    netstat -abno 
    TCP 0.0.0.0:4750 0.0.0.0:0 LISTENING 5592
    [RSCD.exe]

If the process are not bound to the port defined in the secure file, enable debug logging, start the agent, collect the RSCD log files, open a case with BMC Support and include the gathered log files.

7

Verify whether you can connect to the RSCD Agent port on the target system from the system with the connectivity issue.

Do the following to initiate a connection to the target:

  1. telnet <targetSystem> <rscd port>
  2. Inspect the rscd.log file on the target system for an entry such as:
    211dd2c0d8ac9182fe04 0000000093 06/20/20 16:00:04.003 WARN rscd - ::ffff:<source system> 1033 -1/-1 (Not_available): (Not_available): TLS setup failed for agent: Protocol mismatch.
    Check that client and server "secure" files match. Exiting and terminating connection.

If you do not see such a message, start investigating the network path between the systems.

8

Verify that the firewall is not blocking access.

Ensure that the target system does not have the RSCD Agent port blocked by a firewall, which is installed either locally on the target or on the network between the Application Server and the target.

If a firewall is blocking access, configure it to allow connectivity from the Application Server to the target on port 4750 for TCP (default port).

9

Verify whether the AntiVirus or Host Intrusion Protection installed on the target is blocking the Agent.

Review the logs of the security agent and check for indications that it is blocking access to the RSCD process or port.


 Resolutions for common RSCD Agent error messages

The following table lists common errors you may see in NSH, job run logs, or in the GUI while connecting to a remote host running an RSCD Agent, along with the possible cause and solution to the problem. Some error messages have multiple possible causes that are listed in the table.

Symptom

Action

Reference

No authorization to access host

Check that a mapping exists for the incoming user in the users.local or users file and the nouser entry is present in the users file. If there is no mapping entry for the incoming user, the nouser entry blocks access to any unmatched requests. In this case, the following message is displayed in the rscd.log such as:

e2d2854dd45bc9ce7c2f 0000000050 06/24/20 16:43:44.182 WARN rscd - ::ffff:192.168.12.20 2058 0/0 (BLAdmins:BLAdmin): agentinfo: Failed to map user to local user

To correct this problem, grant RBAC permissions on the server object to the user and role, and then push acls to the target system, or add an entry to users.local to grant the role and user access.


No authorization to access host

Check that the exports file on the RSCD host grants the connecting system access. If access is not granted, the following message is displayed in the rscd.log:

be695c0f233edde4635b 0000000049 06/24/20 16:39:41.945 WARN rscd - ::ffff:192.168.12.20 2046 -1/-1 (Not_available): (Not_available): Host not granted access

In this case, update the exports file to grant access to the connecting system.

No authorization to access host

If the following message is displayed:

3f0f1204dbf7ebfc8d0a 0000000058 06/24/20 17:17:56.293 WARN rscd - ::ffff:192.168.12.20 2117 1001/1001 (BLAdmins:BLAdmin): agentinfo: command: "agentinfo" not authorized

Check the mapping entry for the user and verify whether any commands restrictions are applied to the mapping entry. For example,

BLAdmins:BLAdmin rw,map=root,commands=mv,cp

Alter the command authorizations the role has on the server object.

No authorization to access host

If the target system is Linux or Unix, check that the mapped user exists on the target system. For example, if you see the following entry, ensure the dba user exists on the target system: BLAdmins:BLAdmin rw,map=dba


Login not allowed for user

  • Check that the mapped user exists on the target system.
  • If the problem target is a domain controller, check that you do not have duplicate BladeLogicRSCDDC accounts in your domain.
    • If present, stop the RSCD service on all domain controllers and delete the BladeLogicRSCD account in the domain.
    • Force replication from the PDC emulator, start the RSCD service on the PDC emulator, and then force the replication again.
    • Start the RSCD service on the other domain controllers.
  • Check that the BladeLogicRSCDDC account is not locked out.



Login not allowed for user (UPM)

In the rscd log (Windows), the following entry is displayed:

163b5e006ad8e218dfc2 0000000555 06/28/20 09:52:53.518 INFO rscd - 192.168.8.37 6040 BladeLogicRSCD@WIN19-2002->Administrator@WIN19-2002:PrivilegeMapped (bladelogic): agentinfo: agentinfo win19-2002
8d159e3546888e13df29 0000000556 06/28/20 09:53:19.862 ERROR rscd - WIN19-2002 2748 SYSTEM (Not_available): (Not_available): User Impersonation Failed for mapped user bladelogic; Error Location: RSCD_WinUser::initFromUsernameDomainW:LookupAccountNameW ; Error Message: No mapping between account names and security IDs was done. ; Auxiliary Error Message: Account: WIN19-2002\localAdmin
7ffba316a836f314a944 0000000557 06/28/20 09:53:19.862 WARN rscd - 192.168.8.37 2748 SYSTEM (bladelogic): agentinfo: Impersonation failed

Check whether the localAdmin account exists on the target system.


Login not allowed for user (UPM)

In the rscd log (Windows), the following entry is displayed:

ea3241c2333f94eed0a4 0000000558 06/28/20 09:55:18.830 INFO rscd - WIN19-2002 4808 SYSTEM (Not_available): (Not_available): User Privilege Mapping enabled.
cc9332f2785089f7df7c 0000000559 06/28/20 09:55:18.830 INFO rscd - WIN19-2002 4808 SYSTEM (Not_available): (Not_available): The following local user will be used by the agent for user privilege mapping: BladeLogicRSCD
a1ed53505fb85f557a1f 0000000560 06/28/20 09:55:18.908 ERROR rscd - WIN19-2002 4808 SYSTEM (Not_available): (Not_available): User Impersonation Failed for mapped user bladelogic; Error Location: RSCD_WinUser::logonPassword:LsaLogonUser() ; Error Message: Logon failure: the user has not been granted the requested logon type at this computer. ; Auxiliary Error Message: BladeLogicRSCD@WIN19-2002
263f2318d12ac80ef52b 0000000561 06/28/20 09:55:18.908 WARN rscd - 192.168.8.37 4808 SYSTEM (bladelogic): agentinfo: Impersonation failed

Confirm the LocalSystem, BladeLogicRSCD, and the mapped user have been granted Logon as Batch Job, and they are not listed in the Deny Logon as Batch Job and are not a part of any groups listed in the Deny Logon as Batch Job policy.

Login not allowed for user (UPM)

In the rscd log (Windows), the following entry is displayed:

dd2de17df08bea2e758c 0000000027 06/28/20 12:43:07.005 INFO rscd - WIN16-2002 3132 SYSTEM (Not_available): (Not_available): The following local user will be used by the agent for user privilege mapping: BladeLogicRSCD
dd4f111b919808b9fa3a 0000000028 06/28/20 12:43:07.005 ERROR rscd - WIN16-2002 3132 SYSTEM (Not_available): (Not_available): User Impersonation Failed for mapped user oozy; Error Location: RSCD_WinUser::logonPassword:LsaLogonUser() ; Error Message: Account restrictions are preventing this user from signing in. For example: blank passwords aren't allowed, sign-in times are limited, or a policy restriction has been enforced. This user can't sign in because this account is currently disabled. ; Auxiliary Error Message: BladeLogicRSCD@WIN16-2002
1e7ea50fbfe06733da08 0000000029 06/28/20 12:43:07.005 WARN rscd - 192.168.12.20 3132 SYSTEM (oozy): agentinfo: Impersonation failed

Confirm that the BladeLogicRSCD account is not locked out or disabled.

No authorization to access host (Automation Principal)

In the rscd log (Windows), the following entry is displayed:

4bdecf2c3025473b8b0b 0000000580 06/28/20 10:05:26.690 ERROR rscd - WIN19-2002 2660 SYSTEM (Not_available): (Not_available): authenticate_user failed ; Error Location: RSCD_WinUser::logonPassword:LsaLogonUser() ; Error Message: Account restrictions are preventing this user from signing in. For example: blank passwords aren't allowed, sign-in times are limited, or a policy restriction has been enforced. This user can't sign in because this account is currently disabled. ; Auxiliary Error Message: bladelogic@WIN19-2002
52fbf076eaf63297bf11 0000000581 06/28/20 10:05:26.690 WARN rscd - 192.168.8.70 2660 SYSTEM (BLAdmins:BLAdmin): agentinfo: Failed to change to alternate user

Confirm that the user specified in the Automation Principal has the Logon as Batch Job right and is not listed in Deny Logon as Batch Job or a member of any group listed in that policy.


No authorization to access host (Automation Principal)

In the rscd log (Windows), the following entry is displayed:

5901ea67c47d9f5cdd36 0000000585 06/28/20 10:07:43.393 ERROR rscd - WIN19-2002 3892 SYSTEM (Not_available): (Not_available): authenticate_user failed ; Error Location: RSCD_WinUser::logonPassword:LsaLogonUser() ; Error Message: Account restrictions are preventing this user from signing in. For example: blank passwords aren't allowed, sign-in times are limited, or a policy restriction has been enforced. This user can't sign in because this account is currently disabled. ; Auxiliary Error Message: bladelogic@WIN19-2002
fb6fd86519a88235837f 0000000586 06/28/20 10:07:43.408 WARN rscd - 192.168.8.70 3892 SYSTEM (BLAdmins:BLAdmin): agentinfo: Failed to change to alternate user

This indicates the user account specified in the Automation Principal is locked out. Unlock the account to restore access.


No authorization to access host (Automation Principal)

In the rscd log (Windows), the following entry is displayed:

3b40deea304ec0b2cd5b 0000000601 06/28/20 10:11:01.737 ERROR rscd - WIN19-2002 3952 SYSTEM (Not_available): (Not_available): authenticate_user failed ; Error Location: RSCD_WinUser::logonPassword:LsaLogonUser() ; Error Message: The user name or password is incorrect. ; Auxiliary Error Message: bladelogic@WIN19-2002
ae01baf10f304a65ee15 0000000602 06/28/20 10:11:01.737 WARN rscd - 192.168.8.70 3952 SYSTEM (BLAdmins:BLAdmin): CM: Failed to change to alternate user
4befc309f9c0f05c6e35 0000000603 06/28/20 10:11:01.893 ERROR rscd - WIN19-2002 4380 SYSTEM (Not_available): (Not_available): authenticate_user failed ; Error Location: RSCD_WinUser::logonPassword:LsaLogonUser() ; Error Message: The user name or password is incorrect. ; Auxiliary Error Message: bladelogic@WIN19-2002
74d537475344bc47f372 0000000604 06/28/20 10:11:01.893 WARN rscd - 192.168.8.70 4380 SYSTEM (BLAdmins:BLAdmin): CM: Failed to change to alternate user

This indicates that the password specified in the Automation Principal is incorrect.


No authorization to access host (Automation Principal)

In the rscd log (Windows), the following entry is displayed:

a28dbe27d6476c296a5a 0000000605 06/28/20 10:11:52.393 ERROR rscd - WIN19-2002 4876 SYSTEM (Not_available): (Not_available): authenticate_user failed ; Error Location: RSCD_WinUser::initFromUsernameDomainW:LookupAccountNameW ; Error Message: No mapping between account names and security IDs was done. ; Auxiliary Error Message: Account: WIN19-2002\bladelogic
f7c997526c9e26ef550b 0000000606 06/28/20 10:11:52.393 WARN rscd - 192.168.8.70 4876 SYSTEM (BLAdmins:BLAdmin): CM: Failed to change to alternate user
cb3cd4f8b809b14740f0 0000000607 06/28/20 10:11:52.424 ERROR rscd - WIN19-2002 3480 SYSTEM (Not_available): (Not_available): authenticate_user failed ; Error Location: RSCD_WinUser::initFromUsernameDomainW:LookupAccountNameW ; Error Message: No mapping between account names and security IDs was done. ; Auxiliary Error Message: Account: WIN19-2002\bladelogic
241e91954466492cb5f8 0000000608 06/28/20 10:11:52.424 WARN rscd - 192.168.8.70 3480 SYSTEM (BLAdmins:BLAdmin): CM: Failed to change to alternate user

This indicates the account specified in the Automation Principal does not exist or does not have access to the target system.


Permission denied

Check the account that is mapped to and whether that role has the required access permission. For example, if you are trying to read the system log of a target and get the following Permission denied message:

% cat //red8-2002/var/log/messages
cat: //red8-2002/var/log/messages: Permission denied

Check what user you are mapped to by running agentinfo:

% agentinfo red8-2002
[...]

User Permissions: 1001/1001 (dba/dba)

Ensure the mapped account can read the /var/log/messages file.


No route to host

If the target agent is running, this may indicate that a host-based firewall is blocking access.



Error in TLS protocol/ encryption configuration error

This issue occurs when the secure file is different on the two hosts that are interacting. Ensure that both hosts are communicating using the same protocol and encryption settings.

Confirm that the RSCD is listening on 4750 port.

This error indicates connectivity issues with the target. Verify the network connectivity to the target.


I/O error

This sometimes is shown in place of No authorization to access host errors. Use the same methods for resolving this issue. It is also seen when the secure files on each host are different.


Remote host is unknown

This error will happen when either the application server can't resolve the host, or your client can't resolve the host.

Ensure the client system can resolve the target hostname that is registered in the console.


Connection timed out

You might see this error in the following situations:

  • The target is offline
  • A firewall is blocking the agent port
  • The agent is not running


Connection refused

This error occurs when the remote host is down and/or the Agent is not running.

It can also happen when there is a mismatch between the port the Agent communicates over (configured in the secure file) and the port configured on the Agent from the originating connection.


Connection Reset or Broken Pipe

The rscd logs show an initial connection from the client, and then no other connections. A simple command like agentinfo may work but successive calls to the agent will fail. One possible cause to this error is a network firewall that only allows traffic that matches defined profiles. This feature may have different names in different firewall vendors.


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*