Root Cause Analyzer


Root Cause Analyzer in BMC Helix AIOps leverages autonomous agents to analyze event data, correlate events, change requests, and logs in the context of a service and suggest probable root causes. Root Cause Analyzer reduces mean time to resolution (MTTR), lowers operational costs, and increases service reliability, ensuring your IT infrastructure doesn't experience downtime.

A fleet of agents collaborates to analyze all available data, identify likely causes, and revise their hypotheses as they iterate through the debugging process. Once completed, the results are available for the situation, with detailed step-by-step reasoning. This helps operators or site reliability engineers (SREs) understand the situation and perform steps to resolve it.

To use Root Cause Analyzer, make sure that the required product licenses are available. For more information, see BMC Helix AIOps service

Root Cause Analyzer capabilities

Root Cause Analyzer provides the following capabilities:

  • Analyzes events, change requests, and logs in the context of the service
  • Suggests probable root causes, provides a hypothesis and reasoning behind the suggested root cause of a situation
  • Updates the root cause of a situation based on the analysis

To learn more about configuring Root Cause Analyzer, see To set up Root Cause Analyzer.

Scenario

Information

Scenario: Identify the root cause of a situation by assessing the impact of a change request

Susan, an SRE at Apex Global, notices critical events on the Order Processing Service on a Friday morning, with rising latency and error rates. When she opens the Situations page, BMC Helix AIOps has already grouped the alerts into a single situation showing a checkout flow degradation. Customers are experiencing slow checkouts and intermittent order failures.

As Susan investigates the situation, the AI-driven Root Cause Analyzer shows that a recent change request, a security patch applied on Wednesday night caused the situation. Root Cause Analyzer correlates the start of the situation with this change and shows that the patch increased processing time in Order Processing, which in turn overloaded the Inventory Service, causing downstream retries and saturation.

To address this, she temporarily scales up Inventory Service pods, reducing latency and clearing errors, which resolves the situation. By leveraging insights from Root Cause Analyzer, Susan quickly restores service and achieves a much lower MTTR.

Agent type, skills, and models

  • Agent type:
    • AIOps Change Agent: Correlates change events with situations to help teams quickly determine whether a recent change is the likely cause of an issue in the environment.
    • AIOps Log Analysis Agent: Analyzes relevant logs for impacted configuration items (CIs) during a situation and provides a natural‑language summary of log behavior that helps validate or explain the suspected root cause.
    • AIOps BAP Agent: Analyzes a situation and recommends the most effective next actions to resolve it, based on how similar situations were handled in the past and on current context such as changes or logs.
  • Out-of-the-box skill: No
  • Out-of-the-box prompts: No
  • Supported models: Models in BMC HelixGPT

User roles and permissions

Make sure that the following minimum roles are assigned to set up and use Root Cause Analyzer:
ProductRoleDescriptionReference
BMC HelixGPTHelixGPT AdminConfigure the model, update agent settings, and add data sources in BMC Helix Agent Studio.Roles and permissions
BMC Helix AIOpsTenant administratorObtain the fine-tuned model for BMC Helix AIOps, deploy it in the cloud, and configure other settings in BMC HelixGPT.Roles and permissions
BMC Helix AIOpsOperator or site reliability engineer (SRE)Monitor and investigate situations in BMC Helix AIOps.Roles and permissions

Process overview

The following diagram explains the tasks required to configure and use Root Cause Analyzer:

Process overview_configuring AI agents for AIOps_26102.png

Process to set up Root Cause Analyzer

Perform the following tasks to set up Root Cause Analyzer:

 ProductTaskReference
1BMC Helix AIOpsObtain the fine-tuned BMC Helix AIOps model and deploy it in the cloud.Configuring settings to use the AI-powered capabilities in BMC Helix AIOps
2BMC HelixGPTUpdate model settings and verify whether the same model ID is available for the agent.Configuring settings to use the AI-powered capabilities in BMC Helix AIOps

Root Cause Analyzer use case

The following table lists the task that you can perform by using Root Cause Analyzer:

TaskDescriptionReference
View automated root cause analyses for a situationView automated root cause analyses for a situation, derived by analyzing logs and change requests in the service context.  Identifying the root cause of the situation by using Deep RCA

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC HelixGPT 26.1