Troubleshooting BMC BSA Error Codes
BMC Server Automation exit and error codes
This topic presents selected exit or return error codes associated with deployment issues and presents possible interpretations and solutions for the errors. The exit status or return code of a process in computer programming is a small number passed from a child process to a parent process when it has finished executing a specific procedure or delegated task.
The information in this topic was initially collected to provide a central location to track these error codes. It was created by searching the ticketing system for tickets that include error code information with enough cause or solution data to be useful.
Troubleshooting selected BladeLogic deploy job exit and return error codes
Return Code | Exit Code | Product | Possible Meaning | Possible Solution |
---|---|---|---|---|
6 | 3 | Configuration Manager | Permission issue with a package being deployed. | Check the ACLs on the File Server Agent and permissions on the files in the File Server. |
11 | 1 | Configuration Manager | Packages that are getting generated automatically from Patch Analysis have names that are too long. |
|
128 | NA | Configuration Manager | There is something wrong with the Agent installation on the target server. | Re-install the Agent on the target server. |
129 | NA | Configuration Manager | A job run is hitting a timeout that has been defined for that job. | Remove any timeouts for the job. |
139 | NA | Configuration Manager | An RPM deployment is failing for a reason outside of BMC BladeLogic control. | Troubleshoot the issue outside of BSA manually on the server. |
-4001 | 1 | Configuration Manager | Typically a patch deploy job is throwing a warning message that is not being suppressed by the Console |
|
-4002 | 2 | Configuration Manager | A deployment has succeeded but requires manual reboot. |
|
-4003 | 3 | Configuration Manager | Can be seen during batch jobs during the post-OS install of provisioning during reboots. It means that a deploy has failed and requires manual reboot. |
|
5003 | NA | Provisioning Manager | Can be seen when running a post-OS batch job as part of provisioning. |
|
5005 | NA | Configuration Manager | May occur during the simulate phase of a BLPackage deploy job. |
|
BLDeploy return and exit codes
The bldeploy process makes changes based on the instructions in the bldeploy.xml file and the contents of the staging directory. This process is locally invoked on the target that requires the changes. For certain targets, such as Agentless Managed Objects (AMO), the process runs locally on a proxy system and expects Custom Objects to invoke the changes on the remote system.
On successful deployment of the full package, the bldeploy process records the package as being installed and removes the contents of the staging directory for this package. The bldeploy job returns one of the following exit codes:
Return code | Exit code | Meaning |
---|---|---|
0 | 0 | Successfully completed package deployment. |
1 | -4001 | Error occurred when processing the package. This is a generic error and it is assumed that we logged the reason while processing the item. |
2 | -4002 | Successful completion of package, but requires manual reboot to complete installation and deployment. |
3 | -4003 | Error similar to -4001, but indicates that a manual reboot is needed to fully complete the process. |
These codes generally do not provide practical help to the end users: They only have importance to the job result on the AppServer, for the overall success or failure setting. The bldeploy process includes sufficient logging at each step, such that failures and reasons for failure should already be included in the log.
Given the complexity of the deploy process, no existing codes can provide enough detail to indicate the exact reasons for failure. For more information, see the BL Deploy topic, in the BladeLogic Home Space.
Deploy job exit codes (-4001, -4002, -4003)
Deploy job return codes (1 through 10)
Sub return messages
Return Code | String | Exit Code | Meaning |
---|---|---|---|
0 | INCOMPLETE | 1 | Deployment incomplete |
1 | APPLY_SUCCESSFUL | 0 | Apply successful |
1 | APPLY_SUCCESSFUL | 2 | Apply successful; Reboot required to complete |
2 | UNDO_SUCCESSFUL | 0 | Undo successful |
2 | UNDO_SUCCESSFUL | 2 | Undo successful; Reboot required to complete |
3 | DRYRUN_SUCCESSFUL | 0 | DryRun successful |
4 | APPLY_FAILED_NO_ROLLBACK | 1 | Apply failed no rollback was created |
5 | APPLY_FAILED_NO_AUTO_ROLLBACK | 1 | Apply failed no auto-rollback occurred |
6 | APPLY_FAILED_AUTO_ROLLBACK_SUCCESSFUL | 1 | Apply failed; auto-rollback successful |
6 | APPLY_FAILED_AUTO_ROLLBACK_SUCCESSFUL | 3 | Apply failed; auto-rollback successful. Reboot required to complete |
7 | APPLY_FAILED_AUTO_ROLLBACK_FAILED | 1 | Apply failed; auto-rollback failed |
8 | UNDO_FAILED | 1 | Undo failed |
9 | DRYRUN_FAILED | 1 | DryRun failed |
10 | DEPLOY_FAILED | 1 | Deployment failed to process |
11 | UNDO_PARTIALLY_SUCCESSFUL | 1 | Undo partially successful with failed items, overall job phase failure |
12 | APPLY_PARTIALLY_SUCCESSFUL | 1 | Apply partially successful with failed items, overall job phase failure |
ActionOn Failure
When creating the in-memory actions from the XML, there are checks to assure that an action relates to the operating system that the bldeploy process runs on.
For example, an RPM action is an invalid action type for the Windows operating system. This action causes an immediate failure based on type alone and no other information from the XML is read.
Ramifications: A mismatched operating system-action situation cannot be skipped or ignored, even if the ActionOnFailure value is not set to Abort, or that particular item is commented out. That extra bit of information in the XML is not processed until after the action type is compared.
The following values for the ActionOnFailure settings summarize the overall job state for failures:
Code | Meaning |
---|---|
Abort | The bldeploy terminates at this failure point and will either exit or auto-rollback based on job configuration. |
Ignore | The bldeploy ignores the immediate error and continues to process the next item in the package. At the end of the job the job as a whole is marked successful. |
Continue | The bldeploy ignores the immediate error and continues to process the next item in the package. At the end of the job the job as a whole is marked as failed |
BLTJM (BlTargetJobManager) codes
The bltjm process is a program that is invoked by the AppServer to both start and monitor the bldeploy execution. There is a one-to-one relationship between bltjm and bldeploy processes, such that there is always one bltjm running per bldeploy. However, for cases such as single-user mode, the reverse is not true: There could be a bldeploy running without a bltjm. For UNIX systems it can also be true that there is a bldeploy running without a bltjm because of a network failure.
The TargetJobManager (TJM) sends specific state information in the form of event messages (for example, when the process starts or ends) for the bldeploy process to the AppServer, as well as a heartbeat every 90 seconds. The state information is processed by the AppServer to determine what it should do if there is a loss of communication. The heartbeat mechanism lets the AppServer know that the process is still running and the connection is just silent and not lost. If the AppServer does not receive the heartbeat in time, it assumes the connection is lost and attempts to restart the connection.
A restarted connection checks the last known processID in the AppServer, to determine whether the bldeploy process is running. If the process is not running, the TJM attempts to restart the process, based on the last position stored in the .cfg file created by the bldeploy. The bldeploy process skips over the first item that is stored in the .cfg file and starts on the next. item. If there is nothing more to process, the bldeploy process ends the package.
Example: The bldeploy process and the TJM send start messages to notify the AppServer that the bldeploy process has started and is waiting on processing the actions. Until the TJM start message is processed by the AppServer, the loss of communication between the AppServer and the Agent is seen as a job failure. If the TJM start message is not processed in time, the AppServer assumes the connection has been lost, attempts to re-establish the connection, and potentially restart the bldeploy (as is required on rebooting).
The following BLTJM (BlTargetJobManager) error codes are sent if there is no specific return code for the bldeploy process (bldeploy codes always takes precedence):
Code | Meaning |
---|---|
0 | Success. This code is only returned when the bldeploy process succeeds and actually returns the code. |
-5000 | Failure to start application. |
-5001 | Failure processing the event messages. |
-5002 | Failure in the monitoring of the PID because it was either not set or incorrectly set. |
-5003 | Failure to initialize the TJM process correctly. |
-5004 | Failure to stop the bldeploy application. There are times the TJM attemptd to kill the bldeploy process, such as during a job cancellation. If there is a failure to stop that process, this is a potential error code. |
-5005 | Application terminated unexpectedly. This is when the bldeploy stops running for any reason other than a completion of the package process (success or failure). |
-5006 | Unused. The constant variable is declared, but not used anywhere. |
-5007 | TJM was killed unexpectedly |
Related information