Remediating events automatically by using automation policies
Customer success: BMC IT improves MTTR by automating remediation actions for events
BMC IT uses BMC Helix AIOps to monitor the IT infrastructure. When a service or process is down, typically, an operator or a site reliability engineer (SRE) spends hours investigating the event, creating an incident, and if needed, restarting the service or process. When a business critical process is down, it causes a service outage that can last for a significant amount of time until the problem is investigated and remediated.
BMC IT uses the advanced Intelligent Automations feature provided by BMC Helix AIOps to automatically remediate the process down events by restarting the processes. Automation engineers create automation policies in BMC Helix Intelligent Automation, which appear as automation actions against the events in BMC Helix AIOps.
In the following example, a situation in BMC Helix AIOps notifies the operator or SRE that an important process is down and shows the automation actions available against each event included in the situation.
BMC IT uses automation to restart a process without any manual intervention. After the automation is run, the status and the incident ID are displayed for the event.
An operator or SRE can view the details of the automation in BMC Helix Intelligent Automation by using the cross-launch link (appropriate permissions needed).
Workflow
Perform the following tasks to make remediation actions available as automations for events in BMC Helix AIOps:
Task | Product | Role | Action | Reference |
---|---|---|---|---|
1. | BMC Helix AIOps | Tenant Administrator | Enable Intelligent Automations feature from the Configurations menu. | |
2. (Optional) | BMC Helix AIOps | Operator or SRE | (Optional) Request automation for an event under the Services or Situations menu. | |
3. | BMC Helix Intelligent Automation | Automation Engineer | Based on an incoming request or for frequently occurring issues, create an automation policy that contains remediation actions. Automation engineers can set the execution mode to Automatic to trigger remediation actions automatically. | |
4. | BMC Helix AIOps | Operator or SRE | View events and run the automation actions available against the event. |
Results
By implementing the remediation workflow, BMC IT achieved the following results:
- Automated remediation of frequently occurring issues, which saved the need to manually investigate the event, create incidents, and restart the processes that stopped running.
- Capability to request automations if automation actions are not available yet.
- Increased system reliability and improved MTTR from 30-40 minutes to less than five minutes.
- Reports for analyzing results driven by the automated remediation actions.