Remediating events automatically by using automation policies


This use case describes how BMC IT performs intelligent recovery of business applications and improves MTTR by automating remediation actions with BMC Helix Intelligent Automationfor frequently occurring events in BMC Helix AIOps. 

Customer success: BMC IT improves MTTR by automating remediation actions for events

BMC IT uses BMC Helix AIOps to monitor the IT infrastructure. When a service or process is down, typically, an operator or a site reliability engineer (SRE) spends hours investigating the event, creating an incident, and if needed, restarting the service or process. When a business critical process is down, it causes a service outage that can last for a significant amount of time until the problem is investigated and remediated.

BMC IT uses the advanced Intelligent Automations feature provided by BMC Helix AIOps to automatically remediate the process down events by restarting the processes. Automation engineers create automation policies in BMC Helix Intelligent Automation, which appear as automation actions against the events in BMC Helix AIOps. 

In the following example, a situation in BMC Helix AIOps notifies the operator or SRE that an important process is down and shows the automation actions available against each event included in the situation. 

Automations.png

BMC IT uses automation to restart a process without any manual intervention. After the automation is run, the status and the incident ID are displayed for the event. 

Automation Successful.png

An operator or SRE can view the details of the automation in BMC Helix Intelligent Automation by using the cross-launch link (appropriate permissions needed). 

Automation History_App Restart.png


Workflow

Process Graphic.png

Perform the following tasks to make remediation actions available as automations for events in BMC Helix AIOps:

Task

Product

Role

Action

Reference

1.

BMC Helix AIOps

Tenant Administrator

Enable Intelligent Automations feature from the Configurations menu.

2. (Optional

BMC Helix AIOps

Operator or SRE

(Optional) Request automation for an event under the Services or Situations menu.

3. 

BMC Helix Intelligent Automation

Automation Engineer

Based on an incoming request or for frequently occurring issues, create an automation policy that contains remediation actions. Automation engineers can set the execution mode to Automatic to trigger remediation actions automatically.  

4. 

BMC Helix AIOps

Operator or SRE

View events and run the automation actions available against the event. 


Results

By implementing the remediation workflow, BMC IT achieved the following results:

  • Automated remediation of frequently occurring issues, which saved the need to manually investigate the event, create incidents, and restart the processes that stopped running.
  • Capability to request automations if automation actions are not available yet.
  • Increased system reliability and improved MTTR from 30-40 minutes to less than five minutes.
  • Reports for analyzing results driven by the automated remediation actions.