Deploy Job exit and error codes
This topic presents selected exit and return error codes associated with deployment issues, as wells as possible causes and solutions for the errors. The exit status or return code is a number passed from a child process to a parent process when it has finished executing a specific procedure or delegated task.
General Deploy Job exit and return error codes
The following table lists several general return codes and error codes that are issued by Deploy Jobs when problems occur, along with the possible cause and (where applicable) a possible solution.
Permission issue with a package being deployed.
Check the ACLs on the File Server Agent and permissions on the files in the File Server.
Packages that are getting generated automatically from Patch Analysis have names that are too long.
There is something wrong with the Agent installation on the target server.
Re-install the Agent on the target server.
A job run is hitting a timeout that has been defined for that job.
Remove any timeouts for the job.
An RPM deployment is failing for a reason outside of BMC BladeLogic control.
Troubleshoot the issue outside of BMC Server Automation manually on the server.
Typically, a patch deploy job is throwing a warning message that is not being suppressed by the Console
A deployment has succeeded but requires manual reboot.
Can be seen during batch jobs during the post-OS install of provisioning during reboots. It means that a deploy has failed and requires manual reboot.
Can be seen when running a post-OS batch job as part of provisioning.
May occur during the simulate phase of a BLPackage deploy job.
BLDeploy return and exit codes
The bldeploy process makes changes based on the instructions in the bldeploy.xml file and the contents of the staging directory. This process is locally invoked on the target that requires the changes. For certain targets, such as Agentless Managed Objects (AMO), the process runs locally on a proxy system and expects Custom Objects to invoke the changes on the remote system.
On successful deployment of the full package, the bldeploy process records the package as being installed and removes the contents of the staging directory for this package. The bldeploy job returns one of the following exit codes:
|Return code||Exit code||Meaning|
|0||0||Successfully completed package deployment.|
|1||-4001||Error occurred when processing the package. This is a generic error and it is assumed that we logged the reason while processing the item.|
|2||-4002||Successful completion of package, but requires manual reboot to complete installation and deployment.|
|3||-4003||Error similar to -4001, but indicates that a manual reboot is needed to fully complete the process.|
These codes provide information only about the job result (the overall success of failure) on the Application Server. The bldeploy process includes sufficient logging at each step, such that failures and reasons for failure should already be included in the log.
The following table maps more specific failures during deployment to their eventual exit code, as reported in the logs.
Apply successful; Reboot required to complete
Undo successful; Reboot required to complete
Apply failed no rollback was created
Apply failed no auto-rollback occurred
Apply failed; auto-rollback successful
Apply failed; auto-rollback successful. Reboot required to complete
Apply failed; auto-rollback failed
Deployment failed to process
Undo partially successful with failed items, overall job phase failure
Apply partially successful with failed items, overall job phase failure
BLDeploy ActionOnFailure settings
The following table summarizes the options for the ActionOnFailure setting and their impact on the overall job state when failures occur during deployment:
bldeploy terminates at this failure point and will either exit or auto-rollback based on job configuration.
|Ignore||bldeploy ignores the immediate error and continues to process the next item in the package. At the end of the job, the job as a whole is marked successful.|
bldeploy ignores the immediate error and continues to process the next item in the package. At the end of the job, the job as a whole is marked as failed
The bldeploy process automatically validates that actions are relevant for the specific operating system that the bldeploy process runs on. For example, an RPM action is an invalid action type for the Windows operating system. Such an action causes a failure that cannot be skipped or ignored.
BLTJM (BlTargetJobManager) codes
The bltjm process is a program that is invoked by the Application Server to both start and monitor the bldeploy execution. There is a one-to-one relationship between bltjm and bldeploy processes, such that there is always one bltjm running per bldeploy. However, for cases such as single-user mode, the reverse is not true: There could be a bldeploy running without a bltjm. For UNIX systems it can also be true that there is a bldeploy running without a bltjm because of a network failure.
The TargetJobManager (TJM) sends specific state information in the form of event messages (for example, when the process starts or ends) for the bldeploy process to the Application Server, as well as a heartbeat every 90 seconds. The state information is processed by the Application Server to determine what it should do if there is a loss of communication. The heartbeat mechanism lets the Application Server know that the process is still running and the connection is just silent and not lost. If the Application Server does not receive the heartbeat in time, it assumes that the connection is lost and attempts to restart the connection.
A restarted connection checks the last known processID in the Application Server, to determine whether the bldeploy process is running. If the process is not running, the TJM attempts to restart the process, based on the last position stored in the .cfg file created by the bldeploy. The bldeploy process skips over the first item that is stored in the .cfg file and starts on the next item. If there is nothing more to process, the bldeploy process ends the package.
Example: The bldeploy process and the TJM send start messages to notify the Application Server that the bldeploy process has started and is waiting on processing the actions. Until the TJM start message is processed by the Application Server, the loss of communication between the Application Server and the Agent is seen as a job failure. If the TJM start message is not processed in time, the Application Server assumes that the connection has been lost, attempts to re-establish the connection, and potentially restarts bldeploy (as is required on rebooting).
The following BLTJM (BlTargetJobManager) error codes are sent if there is no specific return code for the bldeploy process (bldeploy codes always takes precedence):
|0||Success. This code is only returned when the bldeploy process succeeds and actually returns the code.|
|-5000||Failure to start application.|
|-5001||Failure processing the event messages.|
|-5002||Failure in the monitoring of the PID because it was either not set or incorrectly set.|
|-5003||Failure to initialize the TJM process correctly.|
|-5004||Failure to stop the bldeploy application. There are times the TJM attemptd to kill the bldeploy process, such as during a job cancellation. If there is a failure to stop that process, this is a potential error code.|
|-5005||Application terminated unexpectedly. This is when the bldeploy stops running for any reason other than a completion of the package process (success or failure).|
|-5006||Unused. The constant variable is declared, but not used anywhere.|
|-5007||TJM was killed unexpectedly|