Identifying the root cause of the situation by using Deep RCA


Warning

Important

The features and enhancements in this topic are under controlled availability to select customers.

As an operator or SRE, you need to know the root cause of an ML-based situation created in BMC Helix AIOps so you can fix it, resolve it, and restore the health of the impacted service.

Deep Root Cause Analysis (RCA) is an AI‑based capability that uses autonomous agents to analyze logs and changes in the service context to detect the most probable cause of a situation. It orchestrates these diagnostic agents in parallel and applies LLM‑based reasoning to generate an explainable causal graph of how the issue originated and propagated. This leads to an improved MTTR and reduces operational costs by minimizing lengthy triage bridges and escalations. 

Scenario

Information

Susan is an SRE at Apex Global. On a Friday morning, she starts seeing critical events on the Order Processing Service, with rising latency and error rates. When she opens the Situations page, BMC Helix AIOps has already grouped the alerts into a single situation showing a checkout flow degradation. Customers are experiencing slow checkouts and intermittent order failures.

As Susan investigates the situation, she notices a recent change request—a security patch applied on Wednesday night. The Deep RCA analysis correlates the start of the situation with this change and shows that the patch increased processing time in Order Processing, which in turn overloaded the Inventory Service, causing downstream retries and saturation.

While evaluating whether to roll back the patch, Susan temporarily scales up Inventory Service pods to absorb the load. Latency drops, errors clear, and the situation resolves. By acting on Deep RCA insights instead of chasing symptoms, Susan restores service quickly and achieves a significantly lower MTTR.

Before you begin

As a tenant administrator, make sure that you have performed the following tasks:

To identify the root cause by running AI-based change and log tools manually

If Deep RCA is not enabled for your tenant, you can run the agentic AI-based Change Tool and Log Tool manually to obtain similar results. If the root cause identified by these tools differs, you can refresh the UI page, and the root cause changes based on the tool's results.

  1. On the Situations page, click a situation. 
    A natural language summary of the situation is displayed.
  2. Click Show recommendation planner.
    Based on the situation, either the log tool or the change tool is displayed first and the action that the tool can perform is displayed. 
  3. Click Execute.
    After the tool analyzes the logs or the change requests associated with the impacted host, the status is displayed as Completed. The results of the tool are displayed in the Output section. 
  4. To view details, click Show Details
    BAP_LogTool_DetailedOutput_261.jpg
  5. Click Next to execute the next available tool. 
    If BMC Helix AIOps is unable to find any associated logs or change requests, or if the log or change integration is not configured, the status is displayed as Not applicable.

To view the results from Deep RCA and update the situation

Deep RCA does not require any manual execution. If enabled for your tenant, perform the following steps to identify the root cause based on the agentic AI-based tools: 

  1. On the Situations page, click a situation. 
    A natural language summary of the situation is displayed. On the right side, the Deep RCA panel runs the Log Tool and Change Tool in parallel. 
    DeepRCA_Running_261.jpg
  2. On the situation details page, you can view the following information:
    1. Situation summary
    2. Details such as the severity, priority, type of situation, associated Incident ID, and impacted services
    3. Causal graph visualizing the root cause, a list of events, changes, and predictions, and a graphical view of the impacted CI topology.
      The Deep RCA analysis runs in parallel while you analyze other situation details.
  3. After you see the message, Deep RCA is completed, you can view the following results:Situation_DeepRCALogOutputBlurred_261.jpg
    1. Log Tool: Analyzes the logs related to the associated CIs when the situation was created and provides the output. The log tool analyzes log files from the ten minutes before the situation and the 1 minute after it occurs.
      1. Click Output to view the log tool output. 
      2. Click Detailed output to view additional details.
      3. Click View Reasoning to view the logic applied by the large language model to triage the situation in the context of additional data provided by the log files. 

        Deep RCA uses structured reasoning, usually in the form of how one event led to another, to detect the true root cause by analyzing events, topology, and tool outputs, rather than just correlating symptoms.
        Situation_DeepRCALogOutput_ViewReasoning261.jpg

      4. Click Close.   

    2. Change Tool: Analyzes any recent change requests associated with the CI to identify whether the change is the root cause of the situation. 
      1. Click Output.
        If a change implemented in your environment caused the situation, the details are provided in the output. 
        Situation_DeepRCAChangeRquest_Output_261.jpg
      2. Click Detailed Output to view how the change implemented in your infrastructure led to a series of events that caused the situation.
        Situation_DeepRCAChangeRquest_DetailedOutput_261.jpg
      3. Click Hypothesis to view the assumptions used by the tool to identify the change as the root cause. 
      4. Click View Reasoning to view the step-by-step analysis done by the tool to identify the change as the root cause. 
      5. Click Close.
  4. After evaluating the results, click Update Situation
    The situation summary is updated.
    DeepRCA_RootCauseChange_261.jpg
    In the Root Cause View, within the Situation Explanation section, the causal graph changes and shows Change or Logs depending on the root cause identified by Deep RCA. 
    Deep RCA_SituationExplanation_ChangeasRootCause_261.jpg
  5. Click the root cause to view the summary of the tool and the change request or log details. 
    Situation_DeepRCAChangeRequest_ChangeDetails_261.jpg
  6. (Optional) Click HideDeepRCA_261.jpg to hide the Deep RCA panel. 
    To open the panel again, click the Deep RCA option on the situation details page.  

FAQ

Can both changes and logs be the root cause of a situation?

Deep RCA evaluates both agents in parallel and, when both are applicable, determines which is the actual root cause based on the causal evidence.

What if neither change or logs are the root cause of a situation?

If neither the Change Tool nor the Log Tool identifies a valid root cause, the events that led to the situation are its root cause.

Where to go from here

To further investigate the situation, see:

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix AIOps 26.1