Walkthrough: Identifying potential patching and deployment problems
This topic walks you through a process that can identify servers with access and writeability issues before those problems are exposed during critical patching, scripting, or deployment operations. BMC calls this process a patching dry run. Another term for it is synthetic patching.
This topic includes the following sections:
Although many automated platforms enjoy greater than 99.5% uptime, automated environments can occasionally experience problems. Users often discover these issues when attempting to patch, run scripts on, or deploy software to servers. When those users attempt to diagnose problems, log entries are often not clear, especially to less experienced administrators.
TrueSight Server Automation provides utilities that verify agent availability–the Update Server Properties Job for multiple servers and the Verify Server button for individual servers. But even when servers appear available, they sometimes have configuration problems that prevent effective patching or deployment. In most automated environments, these types of issues are managed effectively, but it is not uncommon for 1-3% of all servers to experience problems with writeability or administrative access.
Here are some types of issues that can prevent successful deployment to servers:
- RBAC misconfiguration makes a role unable to access or write to a server.
- Permissions on a server are mapped to an account that is not Administrator/root or the equivalent.
- An agent may be configured to be read only.
- ACLs have not been updated (pushed) to the agent.
- ACLs on an agent are out of date.
What is a patching dry run?
The following procedure explains how to create and deploy a BLPackage that consists exclusively of a very small file. If the Deploy Job fails to deploy the file to a server, you can identify the root causes, some of which may be systemic to an environment, configuration, or a particular area of the server estate. After you obtain a list of hosts with issues, resolving the root causes tends to be straightforward.
Many organizations export the job run log, filter it, then pass it on to UNIX or Windows engineering teams for resolution. Those groups tend to have local administrative access to hosts. They have better knowledge of the local environment, and they often understand how to resolve issues better than the core automation platform team. The UNIX and Windows engineering teams are a key partner to any automation effort.
BMC recommends that organizations doing large scale patching or software deployments in a highly automated fashion should consider performing patching dry runs on a weekly or monthly basis and reconciling any agents that fail this check. This preventive activity means your users are less likely to discover a host with an agent or availability issue but instead enjoy a relatively healthy environment.
Patching dry runs are intended to help identify RSCD agents that are not able to accept a deployment. This procedure cannot anticipate other types of problems you might encounter during patching, such as missing dependencies.
To perform a patching dry run
Create a very small file, such a blank text file. In UNIX/Linux, you can use the touch command to create a 0-length file. Assign a recognizable file name, such as synthetic_patch_check.txt. Save the file in a temporary location, such as C:\temp (for Windows), /var/tmp (for UNIX or Linux), or the directory used for staging during Deploy Jobs (identified in TrueSight Server Automation as ??TARGET.STAGING_DIR??).
Use TrueSight Server Automation to add the file to the Depot.
Navigate to a location in the Depot folder, right-click, and select New > File. Use the Add File wizard to select a blank file. Then click Finish.
Create a BLPackage based on the small file.
Create a smart server group to use as a target for the patching dry run. The group should contain approximately the same number of servers as you ultimately plan to patch.
BMC recommends that you always use a server smart group as a target. When defining the smart group, include only servers where the AGENT_STATUS property equals "agent is alive". Agents with some other status may be experiencing problems.
Create a Deploy Job using the BLPackage. When defining the Deploy Job:
Execute the Deploy Job. The job should execute very quickly, even across many servers, as each individual task should take only a few seconds to execute. When the job completes, examine the job results and check the logs for the job. In the example shown at right, the log shows that one server is not allowing login, suggesting there may be a problem with how permissions are mapped on that server.
Any hosts that fail are probably in a state that would prevent successful patching or other software deployments. The following situations are common causes of failure. See RSCD Agent Troubleshooting for help understanding the conditions described in the log files.
In many environments, this job should not require a change control given its minimal impact, but in highly regulated environments, you may need to open a change or work order and run the job during a specific maintenance window.
Wrapping it up
Congratulations, you have learned how to anticipate and correct many common problems experienced during patching and other types of software deployments.
Where to go from here
For more information about diagnosing problems with target servers, see RSCD Agent Troubleshooting.