RSCD Agent Error Messages
The following is a list of common errors you may see in NSH, job run logs, or in the GUI when trying to connect to a remote host running an RSCD Agent, along with the possible cause and solution to the problem.
- No authorization to access host
- Login not allowed for user
- Permission Denied
- No Route to Host
- ERROR IN TLS PROTOCOL / Encryption configuration error
- I/O Error
- Remote host is unknown
- Connection timed out
- Connection refused
- "SSL error" with "SSL Protocol Mismatch Error" in agent logs
- Error code (2): Invalid XML source
- App Server and client on the same server with App Server certification plus targets on a separate machine
- Windows group policy for the agent
- Licensing Issues
No authorization to access host
Probably the most common error, this is caused by a mismatch in the ACLs on the remote host and the credentials you are using to connect to the remote host. The files you will want to verify are the users or users.local files on the target, although the exports file may cause the error also. Generally, commenting the nouser entry will temporarily solve this, but is not recommended. Refer to the rscd.log in the agent install directory of the remote host to validate the user trying to establish the connection vs. the entries listed in the ACLs on that server.
Login not allowed for user
There are several reasons why this error may occur.
- The most common problem can happen when the ACLs on the remote host are mapping to a user that does not exist on the remote host. This often happens when the administrator account has been renamed on the remote host or is named differently from a standard defined in your environment.
- In some cases this error may arise when you have incorrectly installed an agent onto a domain controller in your environment. Check your domain to see if you have a duplicate BladeLogicRSCD account.
- Another cause of this issue is when your domain policy contains incompatible entries for "Log on as batch job" and/or "Don't expire password". If these two entries do not have a value for BladeLogicRSCD and are getting propagated across your environment, they will interfere with the agent, causing the Login not allowed for user message. See Installing-RSCD-agents-in-a-replicated-domain-controller-environment.
- If login fails for an agent on a remote host, and another error appears in the rscd.log file to notify you that the BladeLogicRSCD account has been locked out, this is probably due to a problem in the manner that Microsoft Windows handles remote communications. To resolve this issue, create an alternate user for the agent.
Permission Denied
There are two possible causes of this issue and they may be related to one or both of the following cases. First, check permissions your role has against that server. In the Server view, right-click on the server you are trying to deploy to and select Properties. Click the Permissions tab and ensure you have write access on that server (for example, "BLAdmins Server.*").
Second, verify the ACLs on the remote host are granting you write access. View the agent ACLs (either the exports, users, or users.local files) and make sure your role and login have read/write (rw) access.
No Route to Host
SOCKS Proxy
This happens when socks proxies have been configured and routing rules have been enabled, but the socks proxy is either misconfigured, down, or the Application Server cannot reach the socks proxy server.
Check the routing rules, verify in the SERVER properties if the particular server has been configured to go through a socks proxy based on the rules, verify that the socks proxy server is up and running. Check the socks proxy log files.
Non-SOCKS Proxy
The Application Server maintains a DNS cache. Try this:
- Edit the NSH/br/java/lib/security/java.security file — make some changes to the settings noted in this link:http://www.rgagnon.com/javadetails/java-0445.html to change the dns cache settings.
- Restart the Application Server.
Since we cache indefinitely because of the underlying java settings the only way to pick up the new IP would be to restart the Application Server. The cache still honors the TTL on the DNS record so if it expires then we should request it again and then pick up the new value.
Also, see if you have a firewall blocking the port, or if your Security Level is enabled or disabled:
- Type setup.
- Go down to Firewall Configuration and hit enter
- Check to see if Security Level is Enabled/Disabled.
ERROR IN TLS PROTOCOL / Encryption configuration error
Generally this is caused when the secure file is different on the two hosts making contact. Ensure that both hosts are communicating using the same protocol and encryption levels. Always use the secadmin utility for making any changes to the secure file.
This could also be a problem with the agent itself or an interaction between the OS and the agent. Try different commands like ls, cd, ndf, nps, nstats, and nexec (refer to their man page for the syntax) and verify whether the issue is limited to one or more commands. If it is limited to some particular commands, this is a bug in the agent.
This error can also be caused when Shavlik (which is known to use port 4750) or any other program already using that port is running on the remote host, the same port that the RSCD agent uses. Try the following: Restart the RSCD agent, stop the program listening on port 4750, use netstat to make sure the port is not in LISTEN mode, then restart the agent.
This error has also been observed during a mass agent roll out using a silent install on UNIX systems. The issue was that the silent install chose the option of using a random number generating device on the system when one did not exist. To resolve, we reinstalled the agents choosing PRNG and the issue was resolved.
I/O Error
This sometimes is shown in place of No authorization to access host errors. Use the same methods for resolving this issue. It is also seen when the secure files on each host are different. You may also want to try restarting the agent.
Remote host is unknown
This error will happen when either the application server can't resolve the host, or your client can't resolve the host. Sometimes this is the case when you run a custom command and receive blank/no output. Ensure you can ping the remote host and the server is correctly configured in DNS.
Connection timed out
You might see this error in the following situations:
- The server is listed in DNS but is down
- A firewall is blocking the agent port
- The agent is not running
Connection refused
Generally this error will show up when the remote host is down and/or the agent is not running. It can also happen when there is a mismatch between the port the agent communicates over (configured in the secure file) and the port configured on the agent from the originating connection.
"SSL error" with "SSL Protocol Mismatch Error" in agent logs
If telnet to the target on the agent port 4750 works, the target server could be part of cluster with their local loopback NICs talking to each other on a full 10. The application server is sending a request to the target server and the acknowledge is being redirected out to those servers's 10.x.x.x network, which, because the application server IP is 10.52.x.x and the route table says anything bound for the 10 network goes that way, it goes to the internal loopback NICs and never gets back to the application server. You need to reconfigure the local loopback network to something besides the 10.
Error code (2): Invalid XML source
When attempting to license RSCD agent via autolic, the following message is received: Error code (2): Invalid XML source. This error message is an indicator that there is a problem in the agent ACLs. The message in the rscd.log file was that the user account that was being mapped to did not have sufficient permissions to access the necessary directories, and was unable to generate a HostID. Once the ACLs are corrected, autolic works fine.
If you see this error message, do the following:
- Review the rscd.log file for any relevant error messages
- Compare the exports, users, and users.local from a known good host to those of the machine where this error message is being received.
- Make backup copy of exports, users, and users.local, and then copy from known good host.
App Server and client on the same server with App Server certification plus targets on a separate machine
When a customer had an Application Server and the client he was using on the same machine and also did push (putcert) the id.pem certificate but he can not get agentinfo from the machine and receives the following error when trying to get an agentinfo:
In the RSCD.log file there will be one line with the following error:
Solution A (Application Server certificate not completely or incorrect installed)
After implementing App Server to Agent security, the agents will be configured to accept only certified connections, meaning that at that point only the Application Server will be able to communicate with the agents through the console. Stand-alone-NSH will not work anymore, as it does not uses the Application Server's certificate. That's why you will also need to implement NSH to agent security if you want to access agents using NSH.
Follow these steps:
1. Delete id.pem from $HOMEDIR/Application Data/Bladelogic on the machine running NSH.
2. Delete <BL_installdir>\certs\<user_name> on your test target.
3. Revert the secure file to tls_mode=encryption_only on your test target.
4. Run bl_gen_ssl on your machine running NSH to generate a new id.pem.
5. Cd to the folder containing id.pem.
6. Run putcert <user_name> id.pem <testagent> where <testagent> is the agent you'd like to configure for certificates.
7. set the secure file to ((tls_mode=encryption_and_auth}} on your test target.
After restarting NSH, it will prompt you for the passphrase entered in step 4. This will certify the NSH session and thus enable it to communicate with the secured agents.
Solution B (Application Server certificate installed already)
Follow the steps described in TLS-with-client-side-certs-Securing-a-Network-Shell-client.
Of course, the created client-side certificate has then to be pushed on the Application Server / client.
Solution C (Nothing installed yet)
Follow these procedures step by step:
- Creating-a-self-signed-client-side-certificate-on-the-Application-Server-UNIX or Creating-a-self-signed-client-side-certificate-on-the-Application-Server-Windows
- Provisioning-agents-and-repeaters-with-a-SHA1-fingerprint-of-the-Application-Server-self-signed-certificate-UNIX or Provisioning-agents-and-repeaters-with-a-SHA1-fingerprint-of-the-Application-Server-self-signed-certificate-Windows.
- TLS-with-client-side-certs-Securing-a-Network-Shell-client
After that, you should be able to do an agentinfo command on the target you desire (and push the certificates, too)
.
Windows group policy for the agent
The RSCD agent runs under the "Local System" account. For the impersonation to occur the RSCD Agent will "logon" as the BladeLogicRSCD user. Then window API calls are made which apply the appropriate permissions associated with the user you're going to map to. This allows commands to be executed in the context of the 'mapped to' user.
However, the underlying running user is still the "Local System" account which doesn't have access to network resources. That "Local System" user cannot connect to remote windows shares.
Licensing Issues
Software Not Licensed
This error will show up if the license on the host is not valid. It could be because the host was never licensed or because it was licensed for a limited number of CPUs and the number of CPU has increased (this mostly happens on virtual machines).
You will see a similar line in the rscd logs at INFO2 level:
The value for "license status" are:
1 - Not licensed
2 - license expired
3 - trial license
4 - permanent valid license
The agent takes the following steps to verify the validity of a license:
- Takes the following details from the agent whose license needs to be verified: license.dat file, host ID, number of processors
- Parses the license.dat file and gets the license key. For example: Below is the entry in license.dat file where the second parameter is the license key:
11141 35554a8268651862c44276a618dd9b39c51b39e2 1173037642 - Generates the license key from the host ID and the number of processors.
- Compares the newly generated key with the key from license.dat file. If both are same, then it means it is a valid license.
Agent is intermittently licensed/Not Licensed
This usually happens when an agent is running on a VM or any system with dynamic CPU allocation.
We generate 1 license for a certain block of 4 CPUs.
If you licensed the server agent when it was allocated 1 or 2 CPUs, it will generate that license for 3 CPUs. Next time you try to access that server, if it is under load, it might have been allocated 5 CPUs. In this case, it becomes unlicensed. And the second after, it could be back to 3 CPUs again and become licensed again.
The actual validity of a license is based on this: 1-3 CPUs, 4-7 CPUs, 8-11 CPUs... So a license valid for a machine with 1 CPU will still be valid regardless of whether that machine has 1, 2, or 3 CPUs. Once it's upgraded to 4 though it will no longer work. And a 4 CPU license will be good for 4, 5, 6, or 7 CPUs but not 8. It is different when you go in the reverse direction. If you are licensed for 8 CPUs and you take out 7 of them, that license will still work on a 1 CPU machine. To determine the number of CPUs, we use a rather complicated algorithm which goes deep inside the hardware, so even if the OS doesn't provide this info, our agent is able to get it (that is, do not rely on what the OS in the VM shows you).
If you have dynamic CPU allocation, you should generate a license for the maximum number of CPUs the system can be allocated.
Use getlic to generate a license.raw.
# cat license.raw
localhost 1 48897F18
The number in the middle is the number of CPUs. Change that number to Max_nb_of_cpus (for ex 32) and license this host on the Bladelogic licensing portal. Alternatively you can use autolic with the '-c' option for force a cpu count.
Use putlic to put the license.dat on the server.
Also, if you're licensing using autolic, newer versions (7.4.3+) have a flag (-c) to set a different CPU count than the one the agent retrieves.
Agent shows up as licensed with agentinfo but not from the application server point of view
The agent is licensed according to agentinfo, but not according to the application server (job logs).
bl1d% agentinfo usmkenasav02
usmkenasav02:
[...]
License Status : Licensed for NSH/CM
bl1d% autolic autolicuser pass USMKENASAV02
USMKENASAV02: Licensed for NSH/CM
bl1d%
Most likely, the SERVER property AGENT_STATUS has not been updated. This can be achieved either through the update icon on the SERVER Properties or through an update server properties job. You can also set AGENT_STATUS for many servers with the update-server-agent-status.nsh NSH script in the .../samples/blcli directory