This documentation supports an earlier version of BMC Helix Operations Management.To view the documentation for the latest version, select 23.2 from the Product version picker.

Probable cause analysis (PCA)


Probable cause analysis (PCA) is the ability to determine the most likely causes of any issue in an infrastructure environment by correlating millions of monitoring data points and analyzing the relationship between infrastructure nodes and services. The goal is to reduce the mean time to identify or determine (MTTI) and mean time to resolve (MTTR) for issues. To achieve this goal, the product does the following:

  • Delivers a ranked list of the most likely causes of impact.
  • Determines the top suspected causes and provides evidence in the form of events and changes.
  • Computes the impact or PCA score using pre-assigned weightages to the nodes, events, changes, and metrics and inbuilt functions. 
  • Builds and displays a service health timeline for a selected time range to indicate the health score degradation.
  • Displays the service topology with the impact path flowing from various nodes to the service.
  • Displays the metrics collected collected for the causal events.


Probable Cause Analysis

For viewing the PCA information and performing the analysis in the BMC Helix AIOps console, you must have defined the Service Models from BMC Discovery. For more information, see Service-Modeling.

pca_options_aiops_2102.png


  1. Tabs to view Probable Cause, Topology, and Metric. You can view the following details: 

    • Probable Cause view displays the ranked list of contributing nodes (limited to top 3), events, and changes from those nodes. By default only top events and changes are displayed.
    • Topology view shows the relationship between a service and all its nodes.
    • Metric view displays the time-series data collected from key attributes of the causal events.

  2. Causal Nodes (% Probability displays up to top 3 causal nodes that contribute to the probable cause based on the score calculation.

    Top 3 causal nodes

    This is not customizable. If there are more than 3 causal nodes, only the top 3 nodes with the highest impact are displayed.

    • The (% Probability) value is drawn from the score of each event or change. The score of the most impactful event or change is taken as the highest node score. For example, Event#1 has 68% and if Event#2 has 65%, the highest ranked event is 68%.
    • The causal nodes are ranked based on the score calculation from the events or changes. A node with a top most event score at 72% is ranked higher than a node with a top most event score at 71%.

  3. Click Show all events to view all events or click Show top causal events to return to the top causal events list.
    pca_causal_events_2102.pngpca_all_events_2102.png

  4. Score calculation: Hover over the score number to view the details. It is sum of the weightages assigned to the following factors:
    Event Severity, KPI Metric, Multiple Services, Node Depth, Node Kind, and Time Proximity.

    Pre-defined weightage values

    The weightage assigned to each factor is pre-defined. Users do not have an option to modify these values. For some of the factors, the weightage can vary within a range of values and for other factors, it is fixed.

    The following table shows the pre-defined weightage details:

    Factor

    Type

    Weightage

    Event Severity

    Event

    Warning (4), Minor(6), Major (8), and Critical (10)

    KPI Metric

    Event

    0 or 10 - An event with a KPI metric gets 10 and without any metric gets 0

    Multiple Services

    Event

    0 or 10 - An event associated with multiple services gets 10 and only one service gets 0

    Node Depth

    Event or Change

    1 to 20 - 1 for the nearest node to the service and it can go up to 20 for the farthest node to the service.

    Node Kind

    Event or Change

    20 (Fixed value)

    Time Proximity

    Event or Change

    Up to 40 for an event and up to 60 for a change.

  5. Click Events to view the top causal events or click Changes to view the top causal changes.
    By default, the Events button is selected. You can select Changes button to view the change requests that impacted the service.

    pca_serv_with_changes.png

  6. You can view the event details and change details.

    • Click on an event to view the event details or change to view the change details.

      pca_event_details.png

    • You can view additional details about the top causal events:
      1. Related Events tab displays correlation event details, such as event message, the impacted host, occurrence, severity, priority, and status.

        Note

        You can only view this tab if the event is a primary event.

        Event Details_Related Events tab.png

      2. Performance View tab displays the time-series data collected from key attributes of the causal events.

        Note

        You can only view this tab if the event slot value for Class is Alarm.

        Event Details_Performance View tab.png


On-demand PCA score recalibration

The health timeline and the time slots adjust based on the selected time range. The product has the ability to recalibrate the PCA score on demand for any given time slot within the selected time range. You can click on any time slot in the health timeline to identify the impacted nodes, the events, and changes for those node. If you click on any time slot within the time range, makes it as the current time slot for this recalibration and all the rules of the PCA scoring method are employed to re-rank the impacted nodes.

Example

Default PCA score computation

pca_example_1_2102.png

  1. The selected time range is last 24 hours.
  2. The range is between 12:00 hours (previous day) and 12:00 hours (current time).
  3. The topmost causal entity displayed is ulx3od with the severity score of 45%.

    On-demand PCA score computation

    pca_example_2_2102.png

  4. The health score for the selected time slot (16:30 of the previous day) is displayed and an on-demand PCA computation is triggered.
  5. The topmost causal entity displayed is ulx3pq with a severity score of 73%.


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*