Reducing incident noise in BMC Helix ITSM by using Proactive Service Resolution


By using the Proactive Service Resolution solution, you can create consolidated incidents for correlated events (ML-based situations) and reduce the incident noise significantly.

The Proactive Service Resolution solution offers the following benefits:

ReducedEfforts.gif Reduced manual effort

For qualifying situations, incidents are automatically created in BMC Helix ITSM, reducing manual efforts for service desk agents.

ImprovedMTTR.png Increased focus on other high-impact issues

Instead of creating one incident for each qualifying event, a single actionable incident is created for the ML-based situation that is formed for the events. The same incident ID is displayed against the events in the situation. The single incident helps service desk agents focus on high-impact issues rather than the noise created by multiple incidents for all events. 

ImprovedMTTR.gifImproved mean-time-to-resolve (MTTR)

The incident contains details about the causal configuration item (CI), names of the impacted services, qualifying event details, and a direct URL to view the situation in BMC Helix AIOps. The details help the service desk agents perform a faster root cause analysis of the problem and resolve it to restore the health of your infrastructure, leading to an improved MTTR. 

Scenario

Scenario

Jim is a NOC operator at APEX Global, who uses BMC Helix AIOps to monitor an e-commerce application. The application is based on microservices. Carl works as a service desk agent and Susan is an automation engineer at the same organization.

At 2:00 AM, a core microservice in the application's payment gateway experiences a failure due to a database timeout. Jim observes that multiple services are impacted in BMC Helix AIOps due to the failure. Simultaneously, BMC Helix Operations Management receives hundreds of events (for example, CPU spikes, service unreachable, and error rates). These events trigger BMC Helix Intelligent Automation to create hundreds of incidents in BMC Helix Service Management (ITSM). Carl is overwhelmed by redundant incidents related to the underlying same issue across related services. The multiple incidents lead to fatigue and slower response time.

Carl collaborates with Susan to explore a solution in case a similar issue occurs in the future. They identify that the incident noise reduction feature could prove crucial in preventing similar issues in the future.

After Susan configures the incident noise reduction feature, instead of creating incidents for all events, BMC Helix Intelligent Automation waits for a few seconds and verifies whether a situation is created. If a situation is created, it creates an incident for the situation. When the situation is remediated, all relevant events are closed, the situation is closed, and the incident is resolved automatically. The entire process is automatically executed, and no more manual action is required. The consolidated incident significantly reduces the noise and allows Carl to focus on resolving the core issue instead of triaging repetitive incidents.

The following figure shows a sample incident containing the situation URL and configuration items:

IncNoiseReduction_Situation.png

 

Workflow

You must perform the following tasks to implement Proactive Service Resolution for ML-based situations:

INR_WorkflowSteps.png

TaskProductRoleActionReference
1.BMC Helix Intelligent AutomationAutomation engineer
  1. Enable the Incident Noise Reduction option.
  2. Provide the connection details to connect with BMC Helix ITSM.
  3. Define a trigger condition.
  4. Configure the field mappings.
Configuring Proactive Service Resolution for incidents in BMC Helix ITSM
2.BMC Helix AIOpsOperator or site reliability engineer (SRE)View any open ML-based situation and look for the incident ID.Investigating ML-based situations
3.BMC Helix Intelligent Automation

Operator or SRE

View the automation policy run history and check whether the policy run has been successful.Viewing automation policy runs history
4.BMC Helix ITSMService desk agent

View the single incident.

The incident contains the situation URL, impacted configuration item (CI) and the impacted service details.

 

Results

By using the incident noise reduction feature, the following results can be achieved:

  • Improved incident prioritization and response time
  • Lower operational workload for service desk and NOC teams
  • Faster root cause identification
  • Enhanced service availability and reliability

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*