This documentation supports the releases of BMC Helix Service Monitoring till September 2021 (21.3.03). Documentation for later versions is available in the BMC Helix AIOps documentation space. To view the documentation, select a version from the Product version menu.

Probable cause analysis (PCA)

Probable cause analysis (PCA) is the ability to determine the most likely causes of any issue in an infrastructure environment by correlating millions of monitoring data points and analyzing the relationship between infrastructure nodes and services. The goal is to reduce the mean time to identify or determine (MTTI) and mean time to resolve (MTTR) for issues. To achieve this goal, BMC Helix Service Monitoring does the following:

  • Delivers a ranked list of the most likely causes of impact.
  • Determines the top suspected causes and provides evidence in the form of events and changes.
  • Computes the impact or PCA score using pre-assigned weightages to the nodes, events, changes, and metrics and inbuilt functions. 
  • Builds and displays a service health timeline for a selected time range to indicate the health score degradation.
  • Displays the service topology with the impact path flowing from various nodes to the service.
  • Displays the metrics collected collected for the causal events.


PCA in BMC Helix Service Monitoring

For viewing the PCA information and performing the analysis in BMC Helix Service Monitoring, you must have defined the Service Models from BMC Helix Discovery. For more information, see Modeling business services.


  1. Tabs to view Probable Cause, Topology, and Metric. You can view the following details: 

    • Probable Cause view displays the ranked list of contributing nodes (limited to top 3), events, and changes from those nodes. By default only top events and changes are displayed.
    • Topology view shows the relationship between a service and all its nodes.
    • Metric view displays the time-series data collected from key attributes of the causal events.

  2. Causal Nodes (% Probability displays up to top 3 causal nodes that contribute to the probable cause based on the score calculation.

    Top 3 causal nodes

    This is not customizable. If there are more than 3 causal nodes, only the top 3 nodes with the highest impact are displayed.

    • The (% Probability) value is drawn from the score of each event or change. The score of the most impactful event or change is taken as the highest node score. For example, Event#1 has 68% and if Event#2 has 65%, the highest ranked event is 68%.
    • The causal nodes are ranked based on the score calculation from the events or changes. A node with a top most event score at 72% is ranked higher than a node with a top most event score at 71%.

  3. Click Show all events to view all events or click Show top causal events to return to the top causal events list.


  4. Score calculation: Hover over the score number to view the details. It is sum of the weightages assigned to the following factors:
    Event Severity, KPI Metric, Multiple Services, Node Depth, Node Kind, and Time Proximity.

    Pre-defined weightage values

    The weightage assigned to each factor is pre-defined. Users do not have an option to modify these values. For some of the factors, the weightage can vary within a range of values and for other factors, it is fixed.

    The following table shows the pre-defined weightage details:

    FactorTypeWeightage
    Event SeverityEventWarning (4), Minor(6), Major (8), and Critical (10)
    KPI MetricEvent0 or 10 - An event with a KPI metric gets 10 and without any metric gets 0
    Multiple ServicesEvent0 or 10 - An event associated with multiple services gets 10 and only one service gets 0
    Node DepthEvent or Change1 to 20 - 1 for the nearest node to the service and it can go up to 20 for the farthest node to the service.
    Node KindEvent or Change20 (Fixed value)
    Time ProximityEvent or ChangeUp to 40 for an event and up to 60 for a change.
  5. Click Events to view the top causal events or click Changes to view the top causal changes.
    By default, the Events button is selected. You can select Changes button to view the change requests that impacted the service.



  6. You can view the event details and change details.

    • Click on an event to view the event details or change to view the change details.



    • You can view additional details about the top causal events:
      1. Related Events tab displays correlation event details, such as event message, the impacted host, occurrence, severity, priority, and status.

        Note

        You can only view this tab if the event is a primary event.


      2. Performance View tab displays the time-series data collected from key attributes of the causal events.

        Note

        You can only view this tab if the event slot value for Class is Alarm.



On-demand PCA score recalibration

The health timeline and the time slots adjust based on the selected time range. BMC Helix Service Monitoring has the ability to recalibrate the PCA score on demand for any given time slot within the selected time range. You can click on any time slot in the health timeline to identify the impacted nodes, the events, and changes for those node. If you click on any time slot within the time range, makes it as the current time slot for this recalibration and all the rules of the PCA scoring method are employed to re-rank the impacted nodes.

Example

Default PCA score computation

  1. The selected time range is last 24 hours.
  2. The range is between 12:00 hours (previous day) and 12:00 hours (current time).
  3. The topmost causal entity displayed is ulx3od with the severity score of 45%.

    On-demand PCA score computation


  4. The health score for the selected time slot (16:30 of the previous day) is displayed and an on-demand PCA computation is triggered.
  5. The topmost causal entity displayed is ulx3pq with a severity score of 73%.


 

Where to go from here

Performing probable cause analysis.

Was this page helpful? Yes No Submitting... Thank you

Comments