This topic provides guidelines and best practices for appropriately sizing a BMC Database Automation implementation, as well as considerations for performance and end-user experience.
One of the most common concerns when implementing a new technology such as BMC Database Automation is how to ensure that the application provides acceptable levels of performance both for the initial workload as well as the expected growth over time. While the definition of acceptable may vary based on the organization, the user, the network, and the relative patience of all those involved, this document aims to give guidelines and best practices on performance and scalability from a generic perspective.
At a high level, a well-performing implementation takes into account:
Each of these subjects are covered in this topic.
BMC Database Automation has a common architecture model for deployments - a central server provides command and control responsibilities for a set of Agents under management. In this model, one Agent is deployed per managed server (note that server can mean a VM, Solaris Zone, AIX LPAR, or physical server). That Agent communicates securely using SSL with the central server, typically referred to as the Manager.
The Manager itself is a Linux server (see Hardware requirements for specifics) that acts as the repository for most of the content, the central intelligence for creating and deploying jobs, and the web server where the user interface is hosted. Note that currently, it is not possible to segment the different components of the Manager onto different servers or different environments.
In addition to the required components described previously, there is an optional data warehouse that can be added to the infrastructure to provide a datastore with historical configuration information. This information is then consumed for various reporting situations. It also provides a mechanism for integrating inventory and asset data into other systems. This data warehouse is an Oracle database, and is recommended to be run on a separate machine from the management host for performance reasons. In sandbox/test environments it is acceptable to run a small data warehouse on the same machine as the Manager.
BMC certifies three general sizes of Manager, each with a targeted number of Agents:
The hardware specifications and assumptions for these configurations are described in the subsequent sections. For configurations larger than 1200 nodes, please contact BMC Database Automation Support.
For the BMC Database Automation Manager, the server must be installed with Red Hat Enterprise Linux 4, 5 or 6; any update within those releases is supported, but if the node is running Red Hat Enterprise Linux 5 or 6, it must be running the 64-bit version. See System requirements for a more detailed set of specifications for individual packages and options required for the installation of the Manager.
BMC recommends that the Manager be run on a physical server. BMC supports running the Manager on a virtual machine, however it is the organization's responsibility to ensure that the time on the management server remain synchronized with the overall network, and that the virtual machine does not become starved for resources. It's not uncommon to see performance problems on a busy BMC Database Automation virtual machine due to the oversubscription of resources on the physical host.
For the certified sizes above, BMC recommends the following minimum hardware specifications:
For pilot or test environments, it is possible to run a Manager on as little as 2GB of RAM, assuming that only a handful of Agents are connected to the Manager. However, the above sizes should be used for any production implementation.
All of these assume a single-server, non-HA configuration. For HA considerations, refer to High availability.
While the previous sections provide a general set of recommendations, different environments will obviously have different sets of requirements and potential bottlenecks. One organization may have a small number of managed hosts with a very large number of database instances on each host - another may have thousands of jobs that are being executed daily against a small number of databases. This document answers questions about some of this variability, but any concerns or consideration can be directed to BMC Support for further discussion.
Agents become a factor in two areas:
The number of simultaneous Agents connected to a Manager has a small impact on RAM utilization on the Manager, as the system needs to store additional information about each server and the properties of the databases on those servers. The larger implication for Manager sizing is CPU utilization - as more hosts are connected to BMC Database Automation, more objects must be scanned and evaluated as potential candidates for jobs, among other things. The system does this in a multi-threaded fashion, attempting to parallelize as much as possible. Consequently, increasing the quantity of cores available to the system as the number of Agents increases often has a net positive effect on GUI performance. There are also tuning options in the BMC Database Automation configuration that are available to increase performance in large Agent environments. Contact BMC Database Automation Support for specifics.
The number of targets involved in a job is also a consideration. While in small to medium environments it'll be rare to have jobs that execute across more than 20-40 nodes, very large environments can create scenarios where jobs are being executed across 100+ objects. BMC Database Automation has certain internal limits as to the number of targets that will be processed simultaneously. First, any targets on a single server (e.g., three databases on the same machine) will be executed serially to prevent the various activities from conflicting with each other. Second, the software will attempt to parallelize across multiple servers up to a limit.
The quantity of disk space required and consumed by the Manager and Agents is going to vary based on the environment and the types of activities being executed. As a minimum metric, in production environments the Manager should have at least 40GB of disk space available for the BMC Database Automation software, and each Agent should have at least 1GB of disk space free on the filesystem/drive where the Agent is installed.
On the Manager, the primary factors that influence disk utilization are the type and quantity of jobs being executed, and the type and quantity of Patch Packages and Actions being stored on the management server.
Jobs are tasks that are launched from the Manager and executed on one or more target Agents. Typical examples of jobs would be installing a new SQL Server instance, or executing a user created script against a database. As a job is executed, the Agent collects all of the output from commands and scripts that are executed as part of the job. Each set of output is stored as an individual file in a temporary directory under the Agent install directory until the job is complete, at which time the files are concatenated into a single file, called the "Job Log", and in aggregate zipped into a zip file referred to as the "Log Package". Both are uploaded to the Manager and then deleted from the Agent.
The size of the log and log package are dependent on the amount of output returned, but a typical job package is several MB in size (note - a large patching log package can be on the order of 50MB). These jobs are stored on the Manager until deleted, and so can contribute a significant percentage of the disk space delta associated with long-term operation.
Patch packages are database vendor patches, associated metadata, and custom automation logic, and consequently, the size of the patch package is going to vary based on its contents. Oracle patch packages tend to be in the 10-50MB size range, SQL Server patch packages can range from 5MB to 300+MB, and Sybase patches sometimes are greater than 1GB in size. Given this variability, it's best to err on the side of caution when allocating space for patch storage, or plan based on the size of the specific patches that are targeted for implementation in the environment.
Actions are custom automation packages designed to automate operational tasks. Actions consist of the code to be executed as well as any associated files. A typical action that is content-only will often be less than 30KB in size, but can vary based on the size of the other content that is included in the Action. For example, a common use case is an Action that loads a database schema into the targeted database. While the automation logic that loads the data could be 5KB in size, the schema itself might be 5MB or greater in size, and thereby increase the total utilized space for that Action.
Given all of this variability, the starting point of 40GB may well be sufficient for normal implementation, but storage of many large patch packages or long-term retention of large log packages may require considerably more storage, and for large implementations 100GB of space or more may be necessary.
BMC Database Automation is often implemented in large, disparate environments where WAN links, VPN connections, and geographic distance can all conspire to impact network performance. Network issues are typically expressed in two areas - latency and bandwidth. Bandwidth is primarily a performance concern, as delays in uploading job metadata and content can delay the start or execution of a job. However, latency issues can directly affect the usability of the system. This is because each Agent sends a heartbeat once a second to the Manager, notifying the Manager of the current up/down state of the various objects under management by the Agent.
If the Manager does not receive a heartbeat from an Agent at least every 6 seconds, it concludes the Agent is down, and will not allow new jobs to be executed on that Agent, and may time-out jobs that are currently executing on that node.
The Agent sends a heartbeat, waits up to one second to get an acknowledgment from the Manager, and then sleeps for one second before sending the next one. If the Agent fails to get an answer five times in a row, it considers itself down and stops sending heartbeats until it completes the initialization process again. In high latency environments where the heartbeat round-trip time is greater than one second, Agents may experience issues.