Monitoring situations


As operators or site reliability engineers (SREs), you can monitor ML-based and policy-based situations to:

  • Derive actionable insights by viewing all situations and similar situations.
  • Perform faster root cause analysis.
  • Investigate situation details, associated events, and monitor change requests.
  • Take corrective measures to reduce the event noise.
  • Improve the mean time to resolve (MTTR) based on the situation driven workflow.
  • Lower the mean time to detect or discover (MTTD) and the time required for investigating tickets.

Situation naming convention

The situation names in BMC Helix AIOps are automatically derived from the associated service name and the message of the situation event with the highest impact score. The name is displayed in the format: <Name of the service> - <The message of a Situation event with highest impact score>; for example: Railway_Ticketing_Service - Memory utilization is > 80% for 5 mins.

Situations grouping

On the Situations page, you can monitor:

  • All Situations: Lists the independent situations and the groups of all primary situations. An independent situation is an ML-based situation that does not belong to any grouped situation. Primary Situations reduces noise by grouping together similar open situations that occurred due to a similar issue and impacted multiple services across the service hierarchy. Instead of troubleshooting each service and its situation separately, operators or SREs can troubleshoot the Primary Situationicon_masterSituation.png. BMC Helix AIOps leverages AI/ML algorithms to find situation similarity due to temporal, topological or knowledge graph relationships. This helps to further reduce the noise and improve MTTR.

    Primary situations use the following options defined on the Manage Situations
     page under Configurations:

    • Correlation Event Time Window (in mins) to configure the time limit to determine whether a correlation event will be part of a situation.
    • Situation Stability Window (in mins) to configure the time limit to add new correlation events to a situation.

  • Similar Situations: Lists groups of all similar situations from the same service node. Similar Situations provides insights to operators or site reliability engineers (SREs) to optimize the service performance by taking proactive actions on problems that impact a service in their organization. BMC Helix AIOps uses AI/ML algorithms to group situations of similar nature based on their repeated impact on the service in the past. Operators or SREs can perform historical analysis on problems, look at the number of incidents raised, automation runs, severity of situations, time of past occurrences and other useful information and take meaningful actions.  Similar Situations also helps in faster root cause isolation. For any open situation, it provides much-needed context to operators or SREs to understand how a similar situation was resolved in the past, the actions taken to resolve it, and the root cause that was identified. Based on this, they can take similar actions to diagnose and fix the given situation and hence reduce MTTR.

For forming similar situation groups, use the following options defined in the Advanced Settings section on the Manage Situations page under Configurations:


    • Expiry of Similar Situation Group (in days) to configure the maximum number of days a group of similar situations can remain idle, before expiring.
    • Similar Situation Detection Window (in hours) to configure the hourly interval for detecting similar situations to run and form groups.

Change request considerations in ML-based situations

The following considerations are to be noted for the change requests from BMC Helix IT Service Management to be visible as part of the ML-based situations in BMC Helix AIOps:

  • The CIs associated with the change requests must be part of a service and the service must exist in BMC Helix Discovery.
  • The change request must have one of the following stages to be included in situations. All other stages are ignored. 
    • Implementation In Progress
    • Completed
    • Closed
  • The change requests must have one of the following impact levels to be included in situations. All other impact levels are ignored.
    • 1-EXTENSIVE/WIDESPREAD
    • 2-SIGNIFICANT/LARGE
    • 3-MODERATE/LIMITED

For more information, see Stages of a change request and Impact levels. 

Controlling the Situations page display

Use the following UI elements to change the Situations page view:

  • Time period timeperiod_field.png: Use the date and time filter to view situations for a selected period. By default, situations for the last 24 hours are displayed. You can view a similar situation group, if at least two situations of that group are created within the range of selected date and time filter.
     
  • Refresh Refresh.png: Click to refresh the page.
    By default, the Situations page is automatically refreshed after every five minutes. To change the refresh interval duration, see Configuring-general-settings.

  • Save Preferences save_preferences_button.png: Use this option to save the changes to the Advanced filter selections. This option is enabled only when you edit or reset the Advanced filter selections. The preferences are saved for the logged in user until a new preference is updated.

  • Search by Situation name basic_search.png: Enter a Situation name (case-sensitive) in the search box and click Search search.png.

  • Advanced filter advanced_filter.png: Use the filter to view situations based on the status, severity, priority, and type. 
    By default, only the open situations of the severities Critical, Major, Minor, Warning, and Information for the last 24 hours are displayed.

    Important

    When you click Apply filter, the selected filters remain preserved in both the list view and tile views until your session expires or you manually refresh the page.

  • Views three_views.png: Use one of the hierarchical, list, or tile views. The hierarchical and tile views are available only for All Situations. The Investigate icon_Investigate.pngbutton is available only on a tile view. On other views, you can directly open the situation link to investigate.

  • Column selector Column Selector Button.png: Use this selector to clear or select the columns to be displayed. 

  • Sort column_sorter.png: Sorting is available for the OccurredSeverityPriority, and Status columns.

To monitor all situations

  1. On the BMC Helix AIOps console, click Situations.
    By default, all situations are displayed in the hierarchical view. You can view following information:
    • Situation name
    • Occurrence time
    • Number of similar situations
    • Number of correlated situations
    • Number of related events
    • Situation type
    • Severity
    • Priority
    • Status
    • Incident ID associated to the situation 
    • Action
      AllSituationsHierarchy.png

  2. View the independent situations and primary situations.
    The primary situations are indicated by the icon_masterSituation.png(Primary Situation) icon under the Type column.

  3. Do the following actions:
    • For independent situations:
      1. (Optional) Perform actions on a situation; see To perform situation actions.
      2. (Optional) Click the situation to investigate the details; see Investigating-ML-based-situations.

    • For primary situations:
      1. (Optional) Click a primary situation to investigate situation details; see Investigating-ML-based-situations.
      2. Expand the group to identify the root-cause situations from the list of related situations.
        A root-cause situation is indicated by the icon_rootcauseSituation.png(Root-Cause) icon before the situation name. There can be multiple root-cause situations under a primary situation.
      3. (Optional) Perform actions on a situation; see To perform situation actions.
      4. (Optional) Click the root-cause situation to investigate situation details; see Investigating-ML-based-situations.
      5. Collapse the group. 

To monitor similar situations

  1. On the BMC Helix AIOps console, click Similar Situations to view the stabilized ML-based situations groups by their similarities in message, impact, and node ID.
    In the list, you can view following information:
    • Group name of similar situations with the number of situations under that group. 
      SimilarSituationCount_Tooltip.png
    • Occurrence time
    • Number of related events
    • Situation type
    • Severity
    • Priority
    • Status
    • Incident ID associated to the situation
    • Action
      SimilarSituations_1.png

  2. Expand a group.
    You can see all historical occurrences of a situation, both open and closed, irrespective of the selected advanced filter and time period.

  3. Do the following actions:
    1. (Optional) Perform actions on a situation; see To perform situation actions.
    1. (Optional) Click a situation to investigate situation details; see Investigating-ML-based-situations.

  1. Collapse the group.

Situation actions

The following table describes the actions you can perform on situations:

Situation actions

Situation_actions.png

Action

Description

Acknowledge Situation

Recognizes the existence as an open situation. This operation changes the situation status from Open to Acknowledged.

Assign Situation

Assigns ownership of an open, acknowledged, or assigned situation to yourself or another person in the same account. This operation changes the situation status from Open or Acknowledged to Assigned, and the situation owner is updated with the selected user. If the situation status is Assigned, only the ownership changes to the selected user.

Close Situation

Disables any further operations on the situation. Closed situations are not considered for determining the status of a device. You can close situations with Open, Assigned, and Acknowledged statuses only.

Decline Ownership

Removes ownership of a situation in the assigned state. This operation changes the situation status to Acknowledged.

Set Situation Priority

Assigns a priority level to the situation.

Take Ownership

Assigns ownership of Open or Acknowledged situation to yourself.

Unacknowledge Situation 

Changes a previously Acknowledged situation back to the Open state.

Add Notes

Add notes against the situation.

Create Automation

Launches the Create Automation Policy page in  BMC Helix Intelligent Automation to enable tenant administrators to create an automation policy.

Requires the Intelligent Automations feature to be enabled from the Manage Product Features page under Configurations.

Request Automation

Displays the Request Automation dialog box. 

Requires the Intelligent Automations feature to be enabled from the Manage Product Features page under Configurations.

Trigger Automation

Displays the Run Automation dialog box that you can use to run automations for remediating the event. 

Requires the Intelligent Automations feature to be enabled from the Manage Product Features page under Configurations.

Where to go from here

Click a policy-based or an ML-based situation tile to investigate and remediate the situation: