Unsupported content This version of the documentation is no longer supported. However, the documentation is available for your convenience. You will not be able to leave comments.

Troubleshooting BMC BSA Error Codes


 

BMC Server Automation exit and error codes

This topic presents selected exit or return error codes associated with deployment  issues and presents possible interpretations and solutions for the errors. The exit status or return code of a process in computer programming is a small number passed from a child process to a parent process when it has finished executing a specific procedure or delegated task.

The information in this topic was initially collected to provide a central location to track these error codes. It was created by searching the ticketing system for tickets that include error code information with enough cause or solution data to be useful.

Troubleshooting selected BladeLogic deploy job exit and return error codes

Return Code

Exit Code

Product

Possible Meaning

Possible Solution

6

3

Configuration Manager

Permission issue with a package being deployed.

Check the ACLs on the File Server Agent and permissions on the files in the File Server.

11

1

Configuration Manager

Packages that are getting generated automatically from Patch Analysis have names that are too long.

 

128

NA

Configuration Manager

There is something wrong with the Agent installation on the target server.

Re-install the Agent on the target server.

129

NA

Configuration Manager

A job run is hitting a timeout that has been defined for that job.

Remove any timeouts for the job.

139

NA

Configuration Manager

An RPM deployment is failing for a reason outside of BMC BladeLogic control.

Troubleshoot the issue outside of BSA manually on the server.

-4001

1

Configuration Manager

Typically a patch deploy job is throwing a warning message that is not being suppressed by the Console 

 

-4002

2

Configuration Manager

A deployment has succeeded but requires manual reboot.

 

-4003

3

Configuration Manager

Can be seen during batch jobs during the post-OS install of provisioning during reboots. It means that a deploy has failed and requires manual reboot.

 

5003

NA

Provisioning Manager

Can be seen when running a post-OS batch job as part of provisioning.

 

5005

NA

Configuration Manager

May occur during the simulate phase of a BLPackage deploy job.

 

 

BLDeploy return and exit codes

The bldeploy process makes changes based on the instructions in the bldeploy.xml file and the contents of the staging directory. This process is locally invoked on the target that requires the changes. For certain targets, such as Agentless Managed Objects (AMO), the process runs locally on a proxy system and expects Custom Objects to invoke the changes on the remote system.

On successful deployment of the full package, the bldeploy process records the package as being installed and removes the contents of the staging directory for this package. The bldeploy job returns one of the following exit codes:

Return code

Exit code

Meaning

0

0

Successfully completed package deployment.

1

-4001

Error occurred when processing the package. This is a generic error and it is assumed that we logged the reason while processing the item.

2

-4002

Successful completion of package, but requires manual reboot to complete installation and deployment.

3

-4003

Error similar to -4001, but indicates that a manual reboot is needed to fully complete the process.

These codes generally do not provide practical help to the end users: They only have importance to the job result on the AppServer, for the overall success or failure setting. The bldeploy process includes sufficient logging at each step, such that failures and reasons for failure should already be included in the log.

Given the complexity of the deploy process, no existing codes can provide enough detail to indicate the exact reasons for failure. For more information, see the BL Deploy topic, in the BladeLogic Home Space.

Deploy job exit codes (-4001, -4002, -4003)

Define exit code

From: bldeploy\XMlParser\main.cpp:

#define EXIT_SUCCEEDED 0x0
#define EXIT_FAILED 0x1
#define EXIT_REBOOT_REQUIRED 0x2

You get exit code 3 with EXIT_FAILED + EXIT_REBOOT_REQUIRED.

Translate exit code
static const int translateExitCode(const int exitCode)
{
switch(exitCode) {
case 1: return -4001;
case 2: return -4002;
case 3: return -4003;
case 0:
default:
return 0;
}
}


static char* translateExitCodeToString(const int exitCode)
{
switch(exitCode) {
case 1: return "Deployment failed";
case 2: return "Deployment succeeded, but requires manual reboot";
case 3: return "Deployment failed and requires manual reboot";
case 0:
default:
return "Deployment succeeded";
}
}

Deploy job return codes (1 through 10)

Define and translate return code
//Possible return codes from BLdeploy.exe


#define APPLY_SUCCESSFUL                       1
#define UNDO_SUCCESSFUL                        2
#define DRYRUN_SUCCESSFUL                      3
#define APPLY_FAILED_NO_ROLLBACK               4
#define APPLY_FAILED_NO_AUTO_ROLLBACK          5
#define APPLY_FAILED_AUTO_ROLLBACK_SUCCESSFUL  6
#define APPLY_FAILED_AUTO_ROLLBACK_FAILED      7
#define UNDO_FAILED                            8
#define DRYRUN_FAILED                          9
#define DEPLOY_FAILED                          10



static char* translateReturnCodeToString(const int nRet, const int exitCode)
{
switch(nRet) {
case INCOMPLETE: return "Deployment incomplete";
case APPLY_SUCCESSFUL: {
if(exitCode & EXIT_REBOOT_REQUIRED ) {
return "Apply successful; Reboot required to complete";
}
return "Apply successful";
}
case UNDO_SUCCESSFUL: {
if(exitCode & EXIT_REBOOT_REQUIRED ) {
return "Undo successful; Reboot required to complete";
}
return "Undo successful";
}
case DRYRUN_SUCCESSFUL: return "DryRun successful";
case APPLY_FAILED_NO_ROLLBACK: return "Apply failed no rollback was created";
case APPLY_FAILED_NO_AUTO_ROLLBACK: return "Apply failed no auto-rollback occurred";
case APPLY_FAILED_AUTO_ROLLBACK_SUCCESSFUL: {
if(exitCode & EXIT_REBOOT_REQUIRED) {
return "Apply failed; auto-rollback successful. Reboot required to complete";
}
return "Apply failed; auto-rollback successful";
}
case UNDO_PARTIALLY_SUCCESSFUL:
return "Undo partially successful with failed items, overall job phase failure";
case APPLY_PARTIALLY_SUCCESSFUL:
return "Apply partially successful with failed items, overall job phase failure";
case APPLY_FAILED_AUTO_ROLLBACK_FAILED: return "Apply failed; auto-rollback failed";
case UNDO_FAILED: return "Undo failed";
case DRYRUN_FAILED: return "DryRun failed";

case DEPLOY_FAILED:
default:
return "Deployment failed to process";

}
}

Sub return messages

Return Code

String

Exit Code

Meaning

0

INCOMPLETE

1

Deployment incomplete

1

APPLY_SUCCESSFUL

0

Apply successful

1

APPLY_SUCCESSFUL

2

Apply successful; Reboot required to complete

2

UNDO_SUCCESSFUL 

0

Undo successful

2

UNDO_SUCCESSFUL 

2

Undo successful; Reboot required to complete

3

DRYRUN_SUCCESSFUL

0

DryRun successful

4

APPLY_FAILED_NO_ROLLBACK

1

Apply failed no rollback was created

5

APPLY_FAILED_NO_AUTO_ROLLBACK 

1

Apply failed no auto-rollback occurred

6

APPLY_FAILED_AUTO_ROLLBACK_SUCCESSFUL

1

Apply failed; auto-rollback successful

6

APPLY_FAILED_AUTO_ROLLBACK_SUCCESSFUL

3

Apply failed; auto-rollback successful. Reboot required to complete

7

APPLY_FAILED_AUTO_ROLLBACK_FAILED

1

Apply failed; auto-rollback failed

8

UNDO_FAILED

1

Undo failed

9

DRYRUN_FAILED

1

DryRun failed

10

DEPLOY_FAILED 

1

Deployment failed to process

11

UNDO_PARTIALLY_SUCCESSFUL

1

Undo partially successful with failed items, overall job phase failure

12

APPLY_PARTIALLY_SUCCESSFUL

1

Apply partially successful with failed items, overall job phase failure

ActionOn Failure

When creating the in-memory actions from the XML, there are checks to assure that an action relates to the operating system that the bldeploy process runs on.

For example, an RPM action is an invalid action type for the Windows operating system. This action causes an immediate failure based on type alone and no other information from the XML is read. 
Ramifications: A mismatched operating system-action situation cannot be skipped or ignored, even if the ActionOnFailure value is not set to Abort, or that particular item is commented out. That extra bit of information in the XML is not processed until after the action type is compared.

The following values for the ActionOnFailure settings summarize the overall job state for failures:

Code

Meaning

Abort

The bldeploy terminates at this failure point and will either exit or auto-rollback based on job configuration.

Ignore

The bldeploy ignores the immediate error and continues to process the next item in the package. At the end of the job the job as a whole is marked successful.

Continue

The bldeploy ignores the immediate error and continues to process the next item in the package. At the end of the job the job as a whole is marked as failed

 

BLTJM (BlTargetJobManager) codes

The bltjm process is a program that is invoked by the AppServer to both start and monitor the bldeploy execution. There is a one-to-one relationship between bltjm and bldeploy processes, such that there is always one bltjm running per bldeploy. However, for cases such as single-user mode, the reverse is not true: There could be a bldeploy running without a bltjm. For UNIX systems it can also be true that there is a bldeploy running without a bltjm because of a network failure.

The TargetJobManager (TJM) sends specific state information in the form of event messages (for example, when the process starts or ends) for the bldeploy process to the AppServer, as well as a heartbeat every 90 seconds. The state information is processed by the AppServer to determine what it should do if there is a loss of communication. The heartbeat mechanism lets the AppServer know that the process is still running and the connection is just silent and not lost. If the AppServer does not receive the heartbeat in time, it assumes the connection is lost and attempts to restart the connection. 

A restarted connection checks the last known processID in the AppServer, to determine whether the bldeploy process is running. If the process is not running, the TJM attempts to restart the process, based on the last position stored in the .cfg file created by the bldeploy. The bldeploy process skips over the first item that is stored in the .cfg file and starts on the next. item. If there is nothing more to process, the bldeploy process ends the package.

Example: The bldeploy process and the TJM send start messages to notify the AppServer that the bldeploy process has started and is waiting on processing the actions. Until the TJM start message is processed by the AppServer, the loss of communication between the AppServer and the Agent is seen as a job failure. If the TJM start message is not processed in time, the AppServer assumes the connection has been lost, attempts to re-establish the connection, and potentially restart the bldeploy (as is required on rebooting).

The following BLTJM (BlTargetJobManager) error codes are sent if there is no specific return code for the bldeploy process (bldeploy codes always takes precedence): 

Code

Meaning

0

Success. This code is only returned when the bldeploy process succeeds and actually returns the code.

-5000

Failure to start application.

-5001

Failure processing the event messages.

-5002

Failure in the monitoring of the PID because it was either not set or incorrectly set.

-5003

Failure to initialize the TJM process correctly.

-5004

Failure to stop the bldeploy application. There are times the TJM attemptd to kill the bldeploy process, such as during a job cancellation. If there is a failure to stop that process, this is a potential error code.

-5005

Application terminated unexpectedly. This is when the bldeploy stops running for any reason other than a completion of the package process (success or failure).

-5006

Unused. The constant variable is declared, but not used anywhere.

-5007

TJM was killed unexpectedly

Related information

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*