Identifying a problem before it impacts service is critical to outage avoidance. BMC has extensive monitoring capabilities in place to proactively monitor performance for the production instances through use of its own tools. These tools include:
Synthetic transactions are used through our TMART deployment to provide proactive management of performance, enabling resolution of an issue frequently before the customer is aware. Synthetic transaction monitoring runs approximately 2500 scripts every two minutes, collecting over two million data points per hour. This monitoring includes baseline measurement comparisons for login and logoff activities, a health check of the URL and basic search and navigation checks.
BMC makes extensive use of its Atrium Orchestrator tool to restore service by automating common recovery tasks. This automation allows us to gather critical logging information for root cause analysis quickly while restoring service. Key benefits of this automation include future outage avoidance and a reduction in resolution times. Key features of the tool that assist in performance monitoring include:
BMC uses EUEM in our data centers to trap certain traffic and analyze it for various types of latency and redirects. It provides information on latency for Host, Network, SSL, e2e, Think, Idle, Number of Requests, Redirects, Transmission Failures and many other useful statistics. This is being used to predict performance degradation as early as possible.
This tool is used on a case-by- case basis and is used to trap traffic based on the following parameters:
EUEM gives BMC the ability to drill down by customer, by user or even by a particular session ID to detect and resolve issues. These kinds of views are available to our customers using the i.onbmc.com support portal so that they can visualize traffic in real-time along with us. We know where the traffic is coming from and how many requests are streaming, as well as an overall indicator of the user experience.
BMC’s TrueSight Operations Management suite enables us to intelligently monitor and manage performance across the entire BMC Helix platform. It also enables real-time visibility into the health of underlying systems that provide services.
We currently monitor the following components in the BMC Helix platform:
|System-level monitoring for OS||System-level monitoring for databases||Custom monitors|
Metrics for each database instance
Email processing health
Automatic remediation of issues via BMC Atrium Orchestrator
Alerting when issues cannot be automatically resolved
BMC leverages this tool to collect about 100 GB of data per hour, providing access to millions of lines of logs – searchable in seconds. BMC executes hundreds of automation workflows a day in our operations. Some of them are focused on actively managing configuration drifts that could otherwise lead to performance problems. We call these types of jobs “closed-loop-compliance” jobs that simply run automatically based on drift detection.
Additionally, the BMC Helix platform's performance is a direct correlation to the capacity and resiliency of its underlying databases. BMC uses latest generation, scalable, high-performance hardware for its database servers. Databases are monitored using the TrueSight suite of tools.
In order to deliver the BMC Helix services in a repeatable, accurate and secure fashion, BMC uses the Server Automation tool. This tool offers intelligent, policy-based compliance measurement for the BMC Helix platform. This tool is used for:
BMC also monitors application performance by executing low-touch in-application workflow in the production environments at certain intervals. Execution of the workflow will provide performance metrics to BMC for the prescribed use cases and allow us to validate cross-functional data flow between BMC Helix services, providing insight into the overall health of the customer systems. The workflow is designed and optimized so that it will:
BMC may add additional monitoring use cases from time to time as needed.