Remediating automatically by using automation policies


By automating remediation actions for frequently occurring events in BMC Helix AIOps, you can perform intelligent recovery of business applications and improve the mean time to resolution (MTTR). 

When a service or process is down, typically, an operator or a site reliability engineer (SRE) spends hours investigating the event, creating an incident, and, if needed, restarting the service or process. When a business-critical process is down, it causes a service outage that can last for a significant amount of time until the problem is investigated and remediated. 

By connecting with BMC Helix Intelligent Automation, automation teams can create remediation policies that show up for events that match the trigger conditions. Operators can manually trigger these remediation actions or they can be designed to run automatically, thus significantly saving time and manual efforts of the NOC teams. 

By automating remediation, IT infrastructure management teams can achieve the following benefits:

Benefits of automating remediation.png


Customer success: An enterprise software and IT consulting company automates remediation for impacted services

The IT infrastructure team at an enterprise software and IT consulting company implemented the automated remediation workflow and achieved the following results:

  • Automated remediation of frequently occurring issues, which saved the need to manually investigate the event, create incidents, and restart the processes that stopped running.
  • Capability to request automation if automation actions are not available yet.
  • Increased system reliability and improved MTTR from 30-40 minutes to less than five minutes.
  • Reports for analyzing results driven by the automated remediation actions.


Workflow

The following diagram illustrates the high-level workflow of automated remediation for events: 

Use_case_remediating_workflow_244.png

Task

Product

Role

Action

Reference

1.

BMC Helix AIOps

Tenant Administrator

Enable the Intelligent Automations feature from the Configurations menu.

2. 

BMC Helix AIOps

Operator or SRE

(Optional) Request automation for an event from the Services or Situations menu.

3. 

BMC Helix Intelligent Automation

Automation Engineer

Create an automation policy that contains remediation actions based on an incoming request or for frequently occurring issues. Automation engineers can set the execution mode to Automatic to trigger remediation actions automatically.  

4. 

BMC Helix AIOps

Operator or SRE

View events and run the automation actions available against the event. 


How does the IT team automate remediation?

When a service or process is down, typically, an operator or a site reliability engineer (SRE) spends hours investigating the event, creating an incident, and if needed, restarting the service or process. When a business-critical process is down, it causes a service outage that can last for a significant amount of time until the problem is investigated and remediated.

The IT team uses the advanced Intelligent Automations feature provided by BMC Helix AIOps to automatically remediate the process down events by restarting the processes. Automation engineers create automation policies in BMC Helix Intelligent Automation that appear as automation actions against the events in BMC Helix AIOps. 

In the following example, a situation in BMC Helix AIOps indicates to the operator or SRE that an important process is down and shows the automations available against each event included in the situation. 

Automations_243.png

The IT team uses automation to restart a process without any manual intervention. After the automation is run, the status and the incident ID is displayed for the event. An operator or SRE can view the details of the automation in BMC Helix Intelligent Automation by using the cross-launch link (appropriate permissions needed). 

Automation Successful_243.png


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*