Remediating recurring problems by using push button automation
Push button automation
As an operator with required permissions, you can manually trigger an automation policy to initiate the remediation as required. This process is appropriate for resolving issues that would take several hours to resolve manually or that are prone to human errors.
Examples
Use the following use case examples to understand the automation workflow and benefits:
- The system performance is impacted due to long-running queries in your environment. Do a quick analysis of the long-running queries and take appropriate remediation measures.
- The Mid Tier pod performance is impacted due to a cache issue. Implement a remediation sequence to run a hard cache flush that optimizes the system's reliability and performance.
Remediation workflow
Typical tasks in the workflow
Before you begin
Based on the nature of your organization and user roles, a tenant administrator, automation engineer, or operator (including a network operations center (NOC) operator, major incident management (MIM) operator, or site reliability engineer (SRE) can perform the following operations:
The operator identifies the issue and initiates a service request in BMC Helix Digital Workplace Catalog.
(Optional) The operator creates an incident ticket in BMC Helix ITSM with the requested details.
- An automation engineer creates an automation policy in BMC Helix Intelligent Automation containing remediation actions for the identified problem, with the execution mode set to Manual to trigger remediation actions on-demand manually. For more information, see Creating-automation-policies.
A tenant administrator or editor creates a dashboard to track and monitor the progress of all automations. For more information, see Setting up dashboards.
- The operator triggers the remediation action in BMC Helix Intelligent Automation and monitors and tracks the progression of remediation status BMC Helix Dashboards.
Workflow sequence
- An identified problem occurs.
- A service request is initiated in the BMC Helix Digital Workplace Catalog to resolve the problem.
- (Optional) An incident is created in BMC Helix ITSM and the incident status is sent to BMC Helix Dashboards.
- The operator manually triggers the automation policy to initiate the remediation action in BMC Helix Intelligent Automation.
- The problem is remediated, and incident in BMC Helix ITSM is closed, and the status is sent to BMC Helix Dashboards.
The following diagram elaborates on the workflow sequence:
Examples
The following table describes the typical example use cases, remediation workflow, and benefits:
Problem cause | Workflow sequence | Benefit | |
---|---|---|---|
1 | System performance is impacted due to long-running queries in your environment. |
| Ensures reliable system operation and optimal performance, eliminating the risk of human error associated with manual resolution. |
2 | The Mid Tier pod performance is impacted due to a cache issue that necessitated a hard cache flush activity. |
| Ensures reliable system operation and optimal performance, eliminating the risk of human error associated with manual resolution. |
Best practices
Use these best practice guidelines according to the implementation in your environment. All of them may not apply to every type of automation workflow.
- Identify repetitive manual tasks in the environment and automate them for reliable and timely responses.
- Build dashboard visualizations in BMC Helix Dashboards to monitor automation performance metrics. This process helps you track how often automation is triggered and measure success and failure rates.
- Whenever automation is triggered or completed, include a work log in the incident. This information helps different support teams (NOC, MIM, or SRE) in your organization to comprehend the problem and resolution history.
- Make sure the automation pipeline fails immediately if a critical step fails.
- If an automation process fails, notify stakeholders promptly to help speed up issue assessment and correction.
Result
Implementing remediation through automation offers several benefits, including:
- Faster incident resolution that leads to meeting customer SLAs effectively.
- Consistent and reliable performance of critical business services, applications, and infrastructure entities.
- 24/7 availability of applications and services without human interference and error-prone manual processes.
- Early identification and prevention of recurring problems.
- Proactive measures automate routine tasks and keep the team available for high-value tasks.