Identifying a problem before it impacts service is critical to outage avoidance. BMC has extensive monitoring capabilities in place to proactively monitor performance for the production instances through use of its own tools. These tools include:
Synthetic transactions are used through our monitoring deployment to provide proactive management of performance, enabling resolution of an issue frequently before the customer is aware. Synthetic transaction monitoring, through use of the TrueSight App Visibility Manager, runs approximately 3500 scripts every two minutes, collecting over two million data points per hour. This monitoring includes baseline measurement comparisons for login and logoff activities, a health check of the URL and basic search and navigation checks.
BMC makes extensive use of its Atrium Orchestrator tool to restore service by automating common recovery tasks. This automation allows us to gather critical logging information for root cause analysis quickly while restoring service. Key benefits of this automation include future outage avoidance and a reduction in resolution times. Key features of the tool that assist in performance monitoring include:
BMC’s TrueSight Operations Management suite enables us to intelligently monitor and manage performance across the entire BMC Helix platform. It also enables real-time visibility into the health of underlying systems that provide services.
We currently monitor the following components in the BMC Helix platform:
|System-level monitoring for OS||System-level monitoring for databases||Custom monitors|
Metrics for each database instance
Email processing health
Automatic remediation of issues via BMC Atrium Orchestrator
Alerting when issues cannot be automatically resolved
BMC leverages this tool to collect about 100 GB of data per hour, providing access to millions of lines of logs – searchable in seconds. BMC executes hundreds of automation workflows a day in our operations. Some of them are focused on actively managing configuration drifts that could otherwise lead to performance problems. We call these types of jobs “closed-loop-compliance” jobs that simply run automatically based on drift detection.
Additionally, the BMC Helix platform's performance is a direct correlation to the capacity and resiliency of its underlying databases. BMC uses latest generation, scalable, high-performance hardware for its database servers. Databases are monitored using the TrueSight suite of tools.
In order to deliver the BMC Helix services in a repeatable, accurate and secure fashion, BMC uses the Server Automation tool. This tool offers intelligent, policy-based compliance measurement for the BMC Helix platform. This tool is used for:
BMC also monitors application performance by executing low-touch in-application workflow in the production environments at certain intervals. Execution of the workflow will provide performance metrics to BMC for the prescribed use cases and allow us to validate cross-functional data flow between BMC Helix services, providing insight into the overall health of the customer systems. The workflow is designed and optimized so that it will:
BMC may add additional monitoring use cases from time to time as needed.