Testing the BMC Server Automation infrastructure
The purpose of this page is to provide a BMC Server Automation administrator with a series of tests that can be performed in a BMC Server Automation environment to determine the cause of any latency issues that might be occurring.
Overview
This topic covers methods for testing the BMC Server Automation infrastructure to determine the root cause of any latency issues that might arise in a BMC Server Automation environment. These tests cover the three tiers of the BMC Server Automation architecture: the Client Tier, Middle Tier, and Server Tier. There are many nodes of communication in a BMC Server Automation environment, so it is important to understand how a BMC Server Automation infrastructure is typically configured.
Before You Begin
Before you begin, ensure that you have root / Administrator level access to all of the BMC Server Automation infrastructure servers. This specifically includes all application servers, the database, file server, and any repeaters that may be configured in your environment.
In addition, ensure that you have the necessary tools for monitoring server and network performance and utilization.
Introduction
When preparing to test your BladeLogic infrastructure, it is best to have some idea where to start. Based on your own personal experience, knowledge of the BMC Server Automation infrastructure, and any feedback that you have gathered from end users, you should have a good idea as to which areas you would like to test first. Also, ensure that you have ready the necessary jobs in BMC Server Automation that have been causing the most amount of delay, and also a set of target servers against which to test these jobs.
Job Execution Performance
Job execution can suffer for several reasons. You may find that it takes a long time for a job to start. You may also find that it takes a long time for a job to complete. In general you should see a fairly linear increase in the amount of time it takes to execute a job for each target you add. That being the case, there are several ways to determine the cause of issues in job execution performance in your environment.
The diagram below shows the servers involved when executing a job, and illustrates the following sequence of events:
- A Job is executed from within the Console.
- The Application Server adds an entry to the table of jobs to be executed in the database.
- An Application Server picks up the job for execution.
- If the type of job requires objects from the File Server to be copied to the selected targets, the files are copied from the file server across the network to those targets..
- All necessary job requests are sent from the Application Server to all selected targets.
- Job status and results received by the Application Server from each target are sent from the Application Server to the Database.
What to Monitor
During the following tests, you will need to monitor the following:
- Application Server utilization (memory, CPU)
- Database utilization (memory, CPU)
- SQL Execution Times
- Network throughput between the Application Server & DB
- Application Server logs with DB debug enabled
Preparation
Change Application Server Logging to Debug
- Locate and open the appserver.cf file in your BladeLogic installation directory.
- Change the logging level to DEBUG so that you are able to view the SQL execution times.
Note
You may wish to clear out the appserver.log so that your new log files are generated within a clean file.
- Restart your application server for the new logging level to take effect.
Tests to Perform
Test One: Time to Execute a Job
- Execute a job, such as an Update Server Properties job against a small number of servers (perhaps 25-50). Repeat this test five times.
- For each run, record how much time it takes between job execution time in the Console and when the job begins to execute.
Note
If you have remote application servers in your environment, run each of these tests against your central application server and your remote application servers.
- For each run, record the SQL execution time by looking at the application server log file.
If it takes a long time for a job to start once it has been executed from the Console, you may be experiencing latency issues in your network between the Application Server and Database Server.
Test Two: Network Throughput and Infrastructure Utilization
To perform this test, we will want to find a job that typically creates a lot of writes to the database. Jobs in this category include Patch Analysis Jobs, Audit Jobs, Snapshot Jobs, and Compliance Jobs. More specifically, one of the following:
- An Audit Job that evaluates a large number of server objects
- A Patch Analysis job that analyzes for all patches on a server
- A CIS or PCI Compliance Job
Once you have selected one or more jobs, run through the following steps:
- Execute the job against a small number of targets (perhaps 5-10).
- While the job is executing, capture the network latency and throughput between the Application Server being tested and your BladeLogic Database Server.
- While the job is executing, analyze the memory and CPU utilization of your Application Server and your Database Server.
Note
If you have remote application servers in your environment, please run each of these tests against your central application server and your remote application servers.
- For each run, record the SQL execution time by looking at the application server log file.
Use the following guidelines to analyze test results:
- If memory and CPU utilization are low on both your Application Server and Database Server, chances are the data is taking a long time to get across the network between both servers. Consider adding bandwidth between these two servers.
- If memory and CPU utilization is high on your Application Server, chances are you could benefit from adding an additional application server to assist with the executing of jobs.
- If memory and CPU utilization is high on your Database Server, chances are you could benefit from moving to a dedicated and / or higher performance Database Server.
- If the SQL execution time is slow, this could have a direct correlation to the ping latency from the application server to the database server. Note the following results received from an existing BMC Server Automation implementation:
Data Center
SQL Execution Time
Ping Latency to DB Server
San Jose
~150 ms
~80 ms
Minneapolis
~3 ms
~0.1 ms
Andover
~3 ms
~0.1 ms
Vienna
~30 ms
~15 ms
UK
~150 ms
~80 ms
Server Automation Console Performance
When using the Console, all actions performed within the GUI are sent to the BMC Server Automation application server. Actions such as browsing the various workspaces, opening objects, or viewing job results all require a query to be sent from the Application Server to the BladeLogic Core Database. Several factors can cause slowness in each leg of a particular request.
Note
Only one Application Server is listed, as the Console will only connect to one Application Server at a time.
Tests to Perform
Launch the Console and run through the following actions. Use the sections below to help determine the root cause of any slowness issues you may experience. Record the amount of time each step takes.
- Browse the "All Servers" group.
- Browse the "All Components" group.
- Open a Component Template containing a large number of rules (such as the BladeLogic CIS or PCI Component Templates.)
Client to Application Server
If the network connection between the Client and the Application Server is slow, there will be delays in the Console when performing any of these actions. When performing each of these tests, what is the network throughput between the Console and the Application Server it is connecting to?
Application Server to Database
When the Console sends a request to the Application Server for each of the above steps, the Application Server will send the request to the Database. When performing each of these tests, what is the network throughput between the Application Server and BMC Server Automation database?
Database Performance
If it takes a long time for the Database to return a request from the Application Server, end users will experience slowness when using the Console. When performing each of these tests, what is the response time of the database?
You may also want to try running a series of SQL queries against your database to help determine how quickly your database server can return results.
Reports Performance
Reports performance can suffer for several reasons. This can be due to slowness in your network, the reports server, or the reports database.
The diagram below shows the servers involved when browsing reports.
- The Web Browser makes the request to the Reports Server
- The Reports Server queries the Database Server for the required information
- The Database Server responds with the requested information
- The data is presented back to the Web Browser
Note
The Application Server is not shown as it is only used for authenticating the initial connection to the Reports Server.
Tests to Perform
Log into BladeLogic Reports and run through the following actions. Ensure that you have run Populate Reports to ensure you have the most up to date data. Use the sections below to help determine the root cause of any slowness issues you may experience. Record the amount of time each step takes.
- Run the "Top Compliance Exposures" report, or any Patch Analysis results report
- Run the "Detailed Server Configuration" report
Client to Application Server
If the network connection between the Client and the Application Server is slow, there will be delays in the Console when performing any of these actions. When performing each of these tests,
Conclusion
In general, most clients find that the biggest bottleneck in the BMC Server Automation infrastructure can be the Database. Without a powerful dedicated Database Server, you will have slow client response times and slow job execution times. However, bottlenecks may be found in the application server, especially if you are executing jobs frequently against a large number of targets. For the most part, simply by analyzing hardware utilization on each of your BMC Server Automation infrastructure servers and throughput speeds between each of these devices, you should be able to determine exactly where improvements can be made to ensure a smooth running environment.
Comments
Log in or register to comment.