Investigating ML-based situations


As an operator or site reliability engineer (SRE), view and investigate ML-based situations for faster root cause analysis and perform the following actions:

  • Investigate primary and related situations
  • Investigate independent and similar situations
  • View the situation summary generated by BMC HelixGPT, view suggested Best Action Recommendations, Log Insights, and use the virtual agent – Ask HelixGPT to analyze and investigate the situation
  • View or create incidents (requires BMC Helix IT Service Management to be enabled)
  • View and analyze causal events and event details and perform event operations


To investigate a primary situation

Primary situations consist of a group of open situations that occurred due to a similar issue and impacted multiple services across the service hierarchy. Instead of troubleshooting each service and its situation separately, operators or SREs can investigate the primary situation and take relevant actions, which helps reduce situation noise. The Similar Situations, Situation Explanation, CI Topology Analysis sections, and BMC HelixGPT-based capabilities are not available for primary situations. 

  1. On the BMC Helix AIOps console, click Situations
    All situations that occurred in the last 24 hours are displayed in a hierarchical view. The Primary Situation.pngicon indicates that it is a primary situation. 
    To learn more about primary situations, see Situations-overview
  2. Expand the primary situation group to view all related situations and identify the root-cause situations.
    A root cause situation is indicated by theicon_rootcauseSituation.png icon. There can be multiple root-cause situations under a primary situation.
  3. Click the primary situation and on the situation details page, view the following details:
    • Situation name, severity, priority, last modified date, and status
    • Incident ID: Click to open the incident in BMC Helix IT Service Management.
      If not created, a Create Incident link is displayed. Click to create an incident in the connected IT Service Management system. (Requires subscription to BMC Helix IT Service Management)
    • Name of the impacted service: Click to open the service details in a new tab. 
      Primary Situation_24102.png

      Important

      Situation highlights are not available for a primary situation.

    • List of related situations
  4. In Related Situations, identify the root cause situation and analyze the attributes such as time of occurrence, number of similar situations, number of related events, type, severity, priority, status, and incident ID.
    1. (Optional) Click the situation name or the related events link to open the situation in a new tab. 
    2. (Optional) Click the Incident ID link to open the incident in BMC Helix IT Service Management – SmartIT.
    3. (Optional) Click the action menu Kebab Menu.pngto perform actions for a situation.
      For more information, see Performing-situation-actions.Situation_details_Related_1.png
  5. To investigate the situation and view its details, continue with the next sections. 


To investigate an independent ML-based situation

  1. Click an open situation and view the following details:
    1. Situation name, severity, priority, incident ID, status, and the name of the impacted service. 
      If an incident is not created, the Create Incident option is displayed. 
    2. Situation highlight: Number of events from the top three impacted hosts.
    3. Similar situations (If more than one similar situation is available).
    4. Situation explanation
    5. CI topology and analysis
    6. If BMC HelixGPT is enabled:
      • A human-readable AI-generated summary of the situation.
      • Best Action Recommendations, split into logical numbered steps that can be used to remediate the situation. Additionally, BMC HelixGPT-driven code wizard offers generated code to accomplish the individual steps of the recommended action, in different languages such as Ansible, Python, and Bash.
      • Log insights collected from logs generated in BMC Helix Log Analytics that help in getting more accurate root causes of the events.
      • An integrated virtual agent that leverages the BMC HelixGPT generative AI capabilities and helps you to ask questions to help investigate and remediate the situation better. To learn more about BMC HelixGPT capabilities, see Situations-overview
  2. Continue to view the next sections. 


To view Best Action Recommendations

Best Action Recommendations are available if is enabled. To enable BMC HelixGPT, contact BMC Support.  

  1. On the situation details page, review an AI-generated summary (short problem statement, brief summary, and detailed problem context). 
    Situation Summary_HelixGPT_24102.png

    Note

    If you disagree with the root cause, use theLike Dislike Buttons_24102.pngoptions to provide feedback so that you get more accurate and specific suggested recommendations in the future. When you click dislike, you are prompted to select the exact causal CI.

  2. Click Show remediation steps.
  3. Depending on the situation, the recommended steps are displayed.
    For example, for a High CPU Utilization issue, the following actions are suggested:
    BAR_IncreaseCPU_HelixGPT_24102.png

  4. (If available) Click Code wizard.
    The code that can be used to run the recommended step is displayed. For some manual actions, the code wizard might not be displayed. 
    CodeW2_BAR_IncreaseCPU_HelixGPT_24102.png
    1. Select your preferred language (Ansible, Python, Bash), and the code is displayed based on the selected language.
    2. Click Copy to clipboard and use the code in your existing script to run the recommended remediation step.
    3. Close the code wizard. 
  5. Continue to the next section to view the log insights for a situation. 


To view Log Insights

Log Insights are available if BMC HelixGPT is enabled. To enable BMC HelixGPT, contact BMC Support.  

BMC HelixGPT connects with your log repository, including BMC Helix Log Analytics, and shows an in-depth analysis of the time-sliced runtime logs from diverse systems to identify the root cause of the situations.

For a situation, operators or SREs can view the BMC HelixGPT generated summarization of the log data and additional information in Log Insights. Log insights help in identifying the root cause of the situation. Use the cross-launch link to view the log details in BMC Helix Log Analytics.

Log Insights_Light Mode_24102.png

To use Ask HelixGPT virtual agent to get more information about the situation

The Ask HelixGPT virtual agent is available if BMC Helix Log Analytics is enabled. To enable BMC HelixGPT, contact BMC Support.  

Use the Ask HelixGPT virtual agent to ask questions within the context of an open or past situation. Using the BMC HelixGPT capabilities, operators can get information about diverse topics regarding infrastructure, service health, and near-real-time predictions.


To view similar situations

The Similar Situations section is displayed only if at least two occurrences of similar situations are identified within a 24-hour window. This section is not displayed for a primary situation.

  1. From the Similar Situations section, analyze the pattern and trend of similar situations associated with the same service node.
  1. Click Aggregated View to view the number of occurrences against the day of the week.
    The Y-axis represents the days of the week starting from Sunday to Saturday. The X-axis shows the hourly time slot for 24 hours. The Aggregated View is displayed if there are situations available for more than one week. 
    Similar_Situations_Aggregated_View.png
  2. Click Detailed View to view the number of occurrences in the last 30 days.
    The Y-axis represents the day and date and the X-axis represents the hourly time slot for 24 hours. This view captures data between the first occurrence date and the most recent occurrence date for the last 30 days. The detailed view is displayed even if the data is available for a single day. 
    Similar_Situations_Detailed_View.png
  3. Click Show More to view similar situation details such as the time of occurrence, number of related events, type, severity, priority, status, and incident ID.
    Clicking the Incident ID link opens the incident in BMC Helix IT Service Management – SmartIT (Required permissions). 
  4. Optional) Click the action menu Kebab Menu.pngto perform actions for a situation.
    For more information, see Performing-situation-actions.
  5. To investigate the situation further by viewing the situation explanation, continue to the next section.


To view the situation explanation

  1. From the Situation Explanation section, use the Root Cause View or List View to analyze the root cause events associated with the situation.
  2. Root Cause View: Shows the impact flow of events in a situation in graphical format.
    Based on the temporal and topological relationships between various causal events in the situation, the ML algorithm determines the root cause event and consequent events. Each event in the graph is aligned against the corresponding CI Kind. The impact direction of the graph indicates the impact flow from the root cause event. You can see the impact score percentage displayed with the event. The total impact score from all the events adds to 100 percent.
    • Hover over an event to view the impacted node details and the corresponding CI or CI Kind highlighted in the CI topology and analysis section.
    • Click an event to view additional details on the Situation Details pane.root_cause_situation_234.png
  3. List View: Displays all causal events.
    In the List view, view the event messages, impacted host, occurrence time, severity, priority, status, and incident ID. 

    Automated remediation action

  4. Click an event message to view the Event Details pane with the following details:
    • Event name, event score, severity, priority, status, and the More Details button to open the event details page in BMC Helix Operations Management. 
    • Event assignee details.
    • Date when the event first occurred or was last modified.
    • Event summary showing the Class, Incident ID, Object Class, Object, and Host. Clicking the Incident ID link opens the incident in BMC Helix IT Service Management – SmartIT. 
      For more information about event classes and objects, see EVENT base event class..

    • Logs and Notes History: All logs and notes for an event are displayed. Type a note in the text box and click Add Note to add any additional notes related to the event. Any note added for the event is reflected for the event in BMC Helix Operations Management.
    • Performance View: If the slot value for the event class is Alarm, the time-series data collected from the key attributes of the causal events of ML-based situations is displayed.
      SituationDetails_Event_1.png
  5. Click the action menu Kebab Menu.pngto perform event actions.
  6. To investigate the situation further by viewing and analyzing the CI topology and analysis, continue to the next section. 


To view CI topology and analysis

  1. In the CI topology and analysis map, view the topology map of the situation, the impacted CIs, and the probability of the impact on the connected CIs. 
  2. Use the following options to view the map based on your requirements:
    1. Views: Switch between the Organic or Hierarchic view to view the impact flow. 
      In the organic view, nodes are placed close to their adjacent nodes, thus saving space. While, in the hierarchic view, the nodes are distributed into layers, which facilitate the identification of dependencies and relationships among the nodes.
      CI Topology_Analysis_Situation_Organic_24101.png
    2. Advanced filters: Apply filters to filter and view the impact based on the selected filters.
    3. Grouping: Click Enable Grouping_CI_Situation_24102.pngto view the topology map grouped by the CI kind. 
    4. Search: If you have a large number of CIs in the hierarchy, use the search box to locate a particular CI. 
    5. Legend: Click to view the legends used for the topological map.
    6. Use the other tools to zoom in, zoom out, or view the map on a full screen. 


Where to go from here

Performing-situation-actions

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*