Investigating ML-based situations


As an operator or a site reliability engineer (SRE), view ML-based situations and perform the following actions for faster root cause analysis of the problem: 

  • Investigate primary and related situations
  • Investigate independent and similar situations
  • View the situation summary, the suggested best action recommendations, and log insights generated by BMC HelixGPT and use the Ask HelixGPT virtual agent to analyze and investigate the situation
  • View or create incidents (requires BMC Helix IT Service Management to be enabled)
  • View and analyze causal events


To investigate a primary situation

A primary situation consists of a group of open situations that occurred due to a similar issue and impacted multiple services across the service hierarchy. Instead of troubleshooting each service and its situation separately, operators or SREs can investigate the primary situation and take relevant actions, which helps reduce situation noise. 

Important

The Similar Situations, Situation Explanation, CI Topology Analysis sections, and BMC HelixGPT-based capabilities are available for only independent situations and not for primary situations. Also, situation highlights are not available for primary situations. 

  1. On the BMC Helix AIOps console, click Situations
    All situations that occurred in the last 24 hours are displayed in a hierarchical view. Primary situations are indicated by thePrimary Situation.pngicon. 
    To learn more about primary situations, see Situations-overview
  2. Expand the primary situation group to view all related situations and identify the root cause situations.
    A root cause situation is indicated by the targeticon_rootcauseSituation.png icon. There can be multiple root cause situations under a primary situation.
  3. Click the primary situation and view the following details on the situation details page:
    • Situation name, severity, priority, last modified date, and status
    • Incident ID: Click to open the incident in BMC Helix IT Service Management.
      If an incident is not created, a Create Incident link is displayed. Click the link to create an incident in BMC Helix IT Service Management (requires a subscription to BMC Helix IT Service Management).
    • Name of the impacted service: Click the name to open the service details in a new tab. 
    • List of related situationsPrimary Situation_24102.png
  4. In the Related Situations section, identify the root cause situation (indicated by the Root Cause Situation_242.pngicon) and analyze the attributes such as time of occurrence, number of similar situations, number of related events, type, severity, priority, status, and incident ID.
    1. (Optional) Click the situation name or the related events link to open the situation in a new tab. 
    2. (Optional) Click the Incident ID link to open the incident in BMC Helix IT Service Management.
    3. (Optional) Click the action menu Kebab Menu.pngto perform actions on a situation.
      For more information, see Performing-situation-actions.Situation_details_Related_1.png
  5. Continue with To investigate an independent situation


To investigate an independent situation

  1. Click an open, independent situation and view the following details:
    • Situation name, severity, priority, incident ID, status, and the name of the impacted service. 
      If Proactive Service Resolution is enabled in BMC Helix Intelligent Automation, the same incident ID is displayed against the situation and the events. If an incident is not created, the Create Incident option is displayed. 

    • Situation highlight: Number of events from the top three impacted hosts.
    • Similar situations (If more than one similar situation is available).
    • Situation explanation
    • CI topology and analysis
    • If BMC HelixGPT is enabled:
      • A human-readable AI-generated summary of the situation.
      • Best action recommendations, a list of suggested steps that can be used to remediate the situation. Additionally, a BMC HelixGPT-driven wizard offers sample code to accomplish individual steps in different languages such as Ansible, Python, and Bash.
      • Log insights collected from logs generated in BMC Helix Log Analytics that help in getting an accurate root cause of the problem.
      • An integrated virtual agent, Ask HelixGPT, which leverages the BMC HelixGPT generative AI capabilities and helps you to ask questions to investigate and remediate the situation better. To learn more about BMC HelixGPT capabilities, see Situations-overview

        Important

        To enable BMC HelixGPT, contact BMC Support.

  2. Continue with To view best action recommendations


To view best action recommendations

Best action recommendations are available if BMC HelixGPT is enabled. To enable BMC HelixGPT, contact BMC Support.  

  1. On the situation details page, review an AI-generated summary (short problem statement, brief summary, and detailed problem context). 
    Situation Summary_HelixGPT_24102.png

    Important

    If you disagree with the root cause, use theLike Dislike Buttons_24102.pngoptions to provide feedback so that you get more accurate and specific suggested recommendations in the future. When you click dislike, you are prompted to select the exact causal CI.

  2. Click Show remediation steps.
    The recommended steps are displayed for a situation.
    For example, for a High CPU Utilization issue, the following steps are suggested:BAR_IncreaseCPU_HelixGPT_24102.png
  3. (If available) Click Code wizard.
    The code that can be used to run the recommended step is displayed. For some manual steps, the code wizard might not be displayed. 
    1. Select your preferred language (Ansible, Python, Bash), and the code is displayed based on the selected language.
    2. Click Copy to clipboard and use the code in your existing script to run the recommended remediation step.
    3. Close the code wizard.CodeW2_BAR_IncreaseCPU_HelixGPT_24102.png
  4. Continue with To view log insights. 


To view log insights

Log insights are available if BMC HelixGPT is enabled. To enable BMC HelixGPT, contact BMC Support.  

BMC HelixGPT connects with your log repository, including BMC Helix Log Analytics, and shows an in-depth analysis of the time-sliced runtime logs from diverse systems to identify the root cause of the situations.

For a situation, operators or SREs can view the BMC HelixGPT-generated summary of the logs in the Log Insights section, which helps in identifying the root cause of the situation. Use the cross-launch link to view the log details in BMC Helix Log Analytics.

Log Insights_Light Mode_24102.png

To use the Ask HelixGPT virtual agent to get more information about the situation

The Ask HelixGPT virtual agent is available if BMC HelixGPT is enabled. To enable BMC HelixGPT, contact BMC Support.  

Use the Ask HelixGPT virtual agent to ask questions within the context of an open or past situation. Using the BMC HelixGPT capabilities, operators can get information about diverse topics regarding infrastructure, service health, and near real-time predictions.

  1. Click Ask HelixGPT.
    The interactive virtual agent dialog box displays the following predefined questions:
    • What is the impact of the issue?
    • Which team can solve this issue?
    • Has this situation happened in the past?
    • Are there any change windows active during this situation?Ask HelixGPT_PredefinedQuestions_242.png
  2. Click any question to get additional information about the situation.
    BMC HelixGPT generates the answer by evaluating information from the incidents created for similar situations in BMC Helix IT Service Management, analyzing time stamps and patterns of similar situations that have occurred in the past, analyzing the service health score of the impacted service of the situation, and the change requests associated with the situation.
    For example, if you click What is the impact of this issue?, the following answer is displayed. 
    Ask HelixGPT_What is the impact_answer1_242.png
  3. (Optional) Click any other question to obtain more details about the situation. 
  4. Continue with To view similar situations


To view similar situations

The Similar Situations section is displayed if at least two occurrences of similar situations are identified within a 24-hour window. This section is not displayed for a primary situation. The Aggregated View grid is displayed if there are situations available for a week, and the Detailed View grid is available if there are situations for a month. 

  1. In the Similar Situations section, view the first and the most recent occurrences of similar situations. 
  2. To analyze the pattern and trend of similar situations associated with the same service node, use Aggregated View or Detailed View:
    • Aggregated View: Shows the occurrences of similar situations against the days of the week. The Y-axis represents the days of the week starting from Sunday to Saturday. The X-axis shows the hourly time slot for 24 hours.Similar_Situations_Aggregated_View.png
    • Detailed View: Shows the occurrences of similar situations in the last 30 days.
      The Y-axis represents the day and date, and the X-axis represents the hourly time slot for 24 hours. This view captures data between the first and the most recent occurrence for the last 30 days. The detailed view is displayed even if the data is available for a single day. 
      Similar_Situations_Detailed_View.png
  3. Click Show More to view similar situation details such as the time of occurrence, number of related events, type, severity, priority, status, and incident ID.
    Clicking the Incident ID link opens the incident in BMC Helix IT Service Management
    (requires a subscription to BMC Helix IT Service Management).
  4. (Optional) Click the action menu Kebab Menu.pngto perform actions on a situation.
    For more information, see Performing-situation-actions.
  5. Continue with To view situation explanation


To view the situation explanation

  1. In the Situation Explanation section, use the Root Cause View or List View to analyze the root cause events associated with the situation.
    • Root Cause View: Shows the impact flow of events in a situation in a graphical format.
      Based on the temporal and topological relationships between various causal events in the situation, the ML algorithm determines the root cause event and consequent events. Each event in the graph is aligned against the corresponding CI kind. The direction in the graph indicates the impact flow from the root cause event. You can see the impact score percentage displayed with the event. The total impact score from all the events adds up to 100 percent.
      • Hover over an event to view the impacted node details and the corresponding CI or CI kind highlighted in the CI topology and analysis section.
      • Click an event to view additional details on the Situation Details pane.root_cause_situation_234.png
    • List View: Displays all causal events and details such as the event messages, impacted host, occurrence time, severity, priority, status, and incident ID. 
      If Proactive Service Resolution is enabled in BMC Helix Intelligent Automation
      , the same incident ID is displayed against the situation and the events. 

      Automated remediation action

  2. Click an event message to view the following details in the Event Details pane:
    • Event name, event score, severity, priority, status, and the More Details link to view the additional event details in BMC Helix Operations Management
    • Event assignee details
    • Date when the event first occurred or was last modified
    • Event summary showing the Class, Incident ID, Object Class, Object, and Host. Clicking the Incident ID link opens the incident in BMC Helix IT Service Management. 
      For more information about event classes and objects, see EVENT base event class..

    • Logs and notes history: All logs and notes for an event are displayed. Type a note in the text box and click Add Note to add any additional notes related to the event. Any note added for the event is reflected in the event in BMC Helix Operations Management.
    • Performance view: If the slot value for the event class is Alarm, the time-series data collected from the key attributes of the causal events of ML-based situations is displayed.
      SituationDetails_Event_1.png
  3. Click the action menu Kebab Menu.pngto perform event actions.

  4. Continue with To view CI topology and analysis


To view CI topology and analysis

  1. In the CI topology and analysis map, view the topology map of the situation, the impacted CIs, and the probability of the impact on the connected CIs. 
  2. Use the following options to view the map based on your requirements:
    1. Views: Switch between the Organic or Hierarchic view to view the impact flow. 
      In the organic view, nodes are placed close to their adjacent nodes, thus saving space. While, in the hierarchic view, the nodes are distributed into layers, which facilitate the identification of dependencies and relationships among the nodes.
      CI Topology_Analysis_Situation_Organic_24101.png
    2. Advanced filters: Apply filters to filter and view the impact based on the selected filters.
    3. Grouping: Click Enable Grouping by CI Kind Enable Grouping_CI_Situation_24102.pngto view the topology map grouped by the CI kind. 
    4. Search: If there are many CIs in the hierarchy, use the search box to locate a particular CI. 
    5. Legend: Click to view the legends used for the topological map.
    6. Use the other tools to zoom in, zoom out, or view the map on a full screen. 


Where to go from here

To perform additional actions on a situation or on the events included in the situation, see Performing-situation-actions.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*