Performing ML-based root cause isolation of an impacted service


From the Services page, you drill down to the service details page to perform root cause isolation of an impacted service. 

As an operator, you can perform the following tasks:

  • View service details
  • Perform ML-based root cause isolation
  • Perform event actions for an impacted service
  • View CI topology
  • View service hierarchy
  • View health indicators
  • View the top metrics from causal nodes
  • View service insights

If you have the appropriate permissions, you can also edit or delete a service. 


To view the details of a service

Do one of the following actions to view the service details page:

  • Click the Overview tab, and from the Services widget, click any of the impacted services whose details you want to view.
  • Click the Services tab and click an individual service heat map or tile.
    The following MP4 shows how to view the service details page, the impacted event details, the event actions available for any event, and how to cross-launch the event details page in BMC Helix Operations Management
    :


    The following image provides more information about the details displayed for a service. 
    Annotated Service Details page.png

    No.

    Description

    1

    Displays the service name, severity, incident ID associated with the service (if available), service impact score in percentage, service health score, and the date and time when the service was last updated.

    Click the link to launch the incident details page in BMC Helix IT Service Management – SmartIT (Must have permissions to view incidents inBMC Helix IT Service Management). 

    2

    Displays the top 3 impacted entities (business services) that are associated with the service.   

    3

    Pie chart displaying the count of open events impacting the service. The events are categorized by event status. The pie chart does not consider the INFO and OK events while displaying the event count. You can click the pie chart to view the list of all impacting events, Situations, changes, and incidents for the service. Currently, only 10000 events are displayed for any service. 

    From the Event Details page, click More Details to cross-launch into BMC Helix Operations Management and view all the associated event details.

    4

    Displays a timeline for the service health over a selected time range. It also shows the health score for the selected time range. You can hover over a time slot to view the health score. The health timeline does not display the INFO and OK events.

    For more information, see Service-health-score-impact-score-and-metrics.

    Legends to indicate incidents, events, and change requests are displayed on the health timeline. Hover over an event, incident, or change request to view the details.  

    For more information, see Total-incident-count-and-mean-time-to-resolve-MTTR-indicators-for-a-reliable-incidence-response-process


To view the ML-based root causes of an impacted service

  1. In the BMC Helix AIOps, do one of the following to view the root causes of an impacted service:
    • Click the Overview tab, and from the Services widget, click any of the impacted services whose details you want to view.
    • Click the Services tab, click an individual service tile.
      Service details page appears. 
  1. (Optional) To customize the columns that appear here, click the column selector and clear the columns that you do not want to appear. Only selected columns are displayed.
    You can also drag and drop the columns to rearrange them based on your requirement.  
    ServiceDetails_Colum Selector.png
  2. To view the class for an event, hover over the icon in the Class column. 
    The event class name is displayed as a tooltip. 
    Class tooltip.png
  3. To view the causal event details by causal nodes or situations, in the Root Cause Isolation tab, click View By and select one of the following options:
    • Causal Nodes: Displays the top 3 causal nodes impacting the service. Click a causal node and perform the following actions to view the event and change request details: RCI_causal.png

      • To view the event details:

        • Click Events to view top causal events. 
        • Hover over the score to view the score calculation details for the event.
        • Click on an event to view event details.

          Performance View tab for event details

          The Performance View tab displays only for an alarm class event. It contains the time-series data collected from key attributes of the causal events.

          The graph shows the display name of the impacted metric or attribute along with the unit of measurement for that metric.

        • Click More Details to launch the event details page in BMC Helix Operations Management. 
        • Click image2022-7-26_10-3-44.pngto perform any of the supported event actions.
          All logs and notes for an event are displayed.
        • Enter a note in the text box and click Add Note to add any additional notes related to the event.
          Any note added for the event is reflected for the event in BMC Helix Operations Management.Event Details with Actions Menu.png
      • To view the change details:
        • Select Changes to view top three change requests.
        • Hover-over the score to view the score calculation. 
        • Click on a change to view change details.
      View all events or all changes
      • Click Show all events or Show all changes link to view all events or all changes for a particular causal node.
      • You can switch back to view only the top events or top changes, by clicking the Show top causal events or Show top causal changes link.
    • Situations: Displays the top 3 situations impacting the service. Click a situation to view the associated events. Click an event to view its details.   RCI_situations.png

      Launch the situation details page on the Situations tab

      Optionally, you can click the launch_situations.png icon to launch the situation details page on the Situations tab. For more information, see Investigating-ML-based-situations.

  4. In the Incident ID column, if an incident is created, click the link to view the incident details in BMC Helix IT Service Management – SmartIT. 
    To launch the incident details page, 
    you must have the permissions to view incidents in BMC Helix IT Service Management. 
  5. In the Automations column, automations that match the event are displayed.
    To run automations, see Remediating-events-for-services-and-situations.
  6. Click Action and perform any of the available actions for the open events. 
    To perform actions, see To perform event actions for an impacted service.


To perform event actions for an impacted service

The capabilities available for your organization and your user role determine the event actions that you can perform against the open events. The following table describes the basic event actions.

Action

Description

Create Automation

Launches the BMC Helix Intelligent Automation > Create Automation Policy page to enable tenant administrators to create an automation policy.

Requires Intelligent Automations feature to be enabled from the Configurations > Manage Product Features page.

For more information, see Creating-automation-policies

Request Automation

Displays the Request Automation dialog box.

Requires Intelligent Automations feature to be enabled from the Configurations > Manage Product Features page.

For instructions on how to raise a request, see Requesting-for-a-new-automation.

Trigger Automation

Displays the Run Automation dialog box that you can use to run automations for remediating the event. 

Requires Intelligent Automations feature to be enabled from the Configurations > Manage Product Features page.

Acknowledge Event

Recognizes the existence of an open event. This operation changes the event status from Open to Acknowledged.

Assign Event

Assigns ownership of an open, acknowledged, or assigned event to yourself or another person in the same account. This operation changes the event status from Open or Acknowledged to Assigned, and the event owner is updated with the selected user. If the event status is Assigned, only the ownership changes to the selected user.

Close Event

Disables any further event operations on the event. Closed events are not considered for calculating the status of a device.

You can close events with statuses Open, Assigned, and Acknowledged only.

Decline Ownership

Removes ownership of an event in the assigned state. This operation changes the event status to Acknowledged.

Set Event Priority

Assigns a priority level to the event.

Take Ownership

Assigns ownership of Open or Acknowledged event to yourself.

Unknowledge Event

Changes a previously Acknowledged event back to the Open state.

Add Notes

Displays the Add Notes dialog box.

Create Incident

Creates an incident in BMC Helix IT Service Management – SmartIT. The incident ID appears against the impacted nodes. You can click the link to to view the incident details in BMC Helix IT Service Management – SmartIT (Must have permissions to view incidents inBMC Helix IT Service Management).

For more information about the impact of the actions on the event, see Performing event operations in the BMC Helix Operations Management online documentation. 


To view the topological map of the service CIs

Click CI Topology to view the topological map of the service CIs and view the node details.   

CI node and impact link display color

The CI topology nodes are displayed as per the node impact severity status and the CI impact path between the impacted nodes is marked with dotted red lines, and the non-impacted nodes is marked with grey lines as shown in this image.

CI_Topology_causal_node_impact_22401.png


    • (Optional) Use the various display options to maximize/minimize, drag or position, zoom in/out, and fit to center the topology map.
    • From the map, select any node to view the node details.
    • (Optional) Change the topology hierarchy, enable or disable aggregation by CI Kind.
    • (Optional) Modify the advanced filter to control the view of topology map.
      Based on the length of the selected criteria and available space to display, the filters are automatically tagged and grouped as +1 active, +2 active, and so on. You can click the tagged number to view the additional filters.
    • (Optional) In the topological map, 10 or more CIs of same kind are automatically grouped together. You need to expand the groups one after another to view the CIs. As example, consider a set of 15 CIs of same kind that are grouped together. After expanding the group, you can view 9 CIs and another group, which you need to expand again to view the remaining 6 CIs. 
      ciTopology_group.png


To view service hierarchy

  1. Click Service Hierarchy to view the service node details of parent and child services.
  2. Click Upstream Hierarchy or Downstream Hierarchy or both to view the upstream (parent nodes) or downstream (child nodes) service hierarchy of the current service. 


To view health indicators

Click Health Indicators to view the health indicators configured for the service. By default, the charts are displayed for the last 24 hours. You have options to view the health indicators for the last 3, 6, 12, or 24 hours. For more information, see Health Indicators and Adding or editing health indicators for a service.


To view metrics for an impacted service

Click Metrics to view the metrics chart for the top attributes of the causal node. If there are more than three metrics, only the top three trending metrics are displayed.
Based on the metric data and its trend, you can take action to resolve the issue. For more information, see Service-health-score-impact-score-and-metrics.
e2e_metrics.png

To view insights for service health, events, and incidents

Click Insights to discover the service behavior and its severity pattern over a pre-defined period of 15 days. The insights are represented in the form of text summaries and their corresponding graphs. These insights help the operators in taking corrective measures to ensure service continuity. For more information, see Service-health-score-impact-score-and-metrics.

Points to remember

  • Service insights are available on demand. It means, whenever you navigate to the Insights tab, you can see the insights, provided the service is available for at least two days.
  • Insights are calculated based on GMT date and time. However, you can see the insights based on your local date and time.


  • Health Score: You can see the trend of service health over a period, with latest percentage degradation in health score between two subsequent dates. 

Insights_HealthScore.png

  • Severity pattern: You can see whether any severity (Critical or Major) is affecting a service everyday during a specific time.

Insights_Pattern.png


  • Major and Critical Events: Occurrences of events are inversely correlated with service health. Increase in occurrences of Critical/Major events impacts a service by reducing its health score. Insights are available if there is an increasing trend of Major or Critical events over the period. You can see the trend of Critical or Major events over a period with average event occurrences and latest number of increase in events between two subsequent dates.
    Insights_CriticalEvents.png

  • Insights for Incidents: Incidents are raised against the events associated to a service, and processed to BMC Helix Operations Management as Incident Info (or INCIDENT_INFO) event class. These Incident Info events are used to derive incidents-related insights in BMC Helix AIOps. Insights are available if there is an increasing trend of incidents events over the period. You can see the trend of incidents over a period with average incident occurrences and latest number of increase in incidents between two subsequent dates.

Insights_Incidents.png


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*