Investigating ML-based independent situations
Related topics
Investigating ML-based primary situations
Investigating ML-based situations that impact multiple services
As an operator or a site reliability engineer (SRE), use independent situations to:
Investigate isolated anomalies or service issues without the noise of correlated data.
Quickly assess the impact by reviewing severity, priority, and the affected CI or service.
Perform root cause analysis by drilling into the CI’s performance metrics, logs, and events.
Identify and address recurring issues by reviewing historical data and patterns for the same CI.
Trigger appropriate remediation actions such as automation or incident creation.
Identify cross-site network issues with BGP-based situations that group events from BGP-connected routers and services to highlight shared infrastructure problems.
An independent situation is an ML-based situation formed by correlating events within a single service, based on event patterns, signatures, anomalies, and topological connections between event nodes. It reflects issues that are limited to one service and do not involve dependencies on other services.
Independent situations allow operators or SREs to focus on targeted troubleshooting and resolution. They are especially useful for investigating one-off issues or monitoring anomalies that have not yet shown broader impact across the environment.
Example 1
A spike in CPU usage is detected on a login service server due to a configuration update.
This anomaly triggers a situation that affects only the login service node. There are no related issues detected in other services, and the situation is not linked to any shared CI or topological dependency. The situation is thus treated as an independent situation, not grouped into any primary, multiservice, or similar category.
Example 2
Two routers, Router_A and Router_B, are located in the Pune data center, but belong to different autonomous systems (AS1 and AS2).
These routers establish a BGP session to exchange routing information across internal network domains.
Suddenly, the BGP session between Router_A and Router_B goes down, triggering events on both routers.
BMC Helix AIOpsreceives events and identifies them as BGP events generated due to BGP failure. Based on the discovered BGP IP-to-IP connectivity, it recognizes that these events are related, even though the routers are on the same site. A BGP-based situation is formed, correlating the two events and linking them through the discovered BGP relationship between the IPs. The situation identifies the broken BGP session as the root cause.
To investigate an independent situation
- Click an open, single-service situation and view the following details:
Situation name, severity, priority, incident ID, status, and the name of the impacted service.
If Proactive Service Resolution is enabled in BMC Helix Intelligent Automation, the same incident ID is displayed against the situation and the events. If an incident is not created, the Create Incident option is displayed.- Show Notes: Opens the Logs and Notes panel to display the notes added to a situation.
- Situation highlight: Number of events from the top three impacted hosts.
- Similar situations (If more than one similar situation is available).
- Situation explanation
- CI topology and analysis
- If BMC HelixGPT is enabled:
- A human-readable AI-generated summary of the situation.
- Best action recommendations, a list of suggested steps that can be used to remediate the situation. Additionally, a BMC HelixGPT-driven wizard offers sample code to accomplish individual steps in different languages such as Ansible, Python, and Bash.
- Log insights collected from logs generated in BMC Helix Log Analytics, that help in getting an accurate root cause of the problem.
An integrated virtual agent, Ask HelixGPT, which leverages the BMC HelixGPT generative AI capabilities and helps you to ask questions to investigate and remediate the situation better. To learn more about BMC HelixGPT capabilities, see Situations-overview.
- Continue with To view best action recommendations.
To view best action recommendations
BMC Helix AIOps with BMC HelixGPT enables you to connect to the following ITSM data sources to generate best action recommendations:
- BMC Helix IT Service Management
- ServiceNow BMC Helix IT Service Management (Controlled availability customers only)
- Jira (Controlled availability customers only)
Administrators must configure third-party data sources in BMC HelixGPT to generate recommendations. To configure third-party data sources, see Adding data sources in BMC HelixGPT in BMC HelixGPT documentation.
- On the situation details page, review an AI-generated summary (short problem statement, brief summary, and detailed problem context).
- Click Show remediation steps.
The recommended steps are displayed for a situation.
For example, for a High CPU Utilization issue, the following steps are suggested:
- (If available) Click Code wizard.
The code that can be used to run the recommended step is displayed. For some manual steps, the code wizard might not be displayed.- Select your preferred language (Ansible, Python, Bash), and the code is displayed based on the selected language.
- Click Copy to clipboard and use the code in your existing script to run the recommended remediation step.
- Close the code wizard.
- Continue with To view log insights.
To view log insights
BMC Helix AIOps with BMC HelixGPT enables you to connect to the following log data sources to generate log insights:
- BMC Helix Log Analytics (no configuration required)
- Splunk Enterprise
- ElasticSearch
Administrators must configure third-party data sources in BMC HelixGPT Manager to generate insights. To configure third-party data sources, see Adding data sources in BMC HelixGPT in BMC HelixGPT documentation.
- Click Ask HelixGPT and then click Log Insights. The first time that you view log insights for a situation, a progress bar is displayed to show the progress of the log summary generation. If you view log insights for the same situation again, the summary loads without delay. Depending on the log source configured in BMC HelixGPT, actionable insights from the logs related to the configuration item are displayed. which helps in identifying the root cause of the situation.
- Use the cross-launch link to view the log details in BMC Helix Log Analytics.
To use the Ask HelixGPT virtual agent to get more information about the situation
The Ask HelixGPT virtual agent is available if BMC HelixGPT is enabled. To enable BMC HelixGPT, contact BMC Support.
Use the Ask HelixGPT virtual agent to ask questions within the context of an open or past situation. Using the BMC HelixGPT capabilities, operators can get information about diverse topics regarding infrastructure, service health, and near real-time predictions.
- Click Ask HelixGPT.
The interactive virtual agent dialog box displays the following predefined questions:- What is the impact of the issue?
- Which team can solve this issue?
- Has this situation happened in the past?
- Are there any change windows active during this situation?
- Click any question to get additional information about the situation.
BMC HelixGPT generates the answer by evaluating information from incidents created for similar situations in BMC Helix IT Service Management, analyzing time stamps and patterns of similar situations that have occurred in the past, analyzing the service health score of the impacted service of the situation, and the change requests associated with the situation.
For example, if you click What is the impact of this issue?, the following answer is displayed. - (Optional) Click any other question to obtain more details about the situation.
- Continue with To view similar situations.
To view similar situations
The Similar Situations section appears only when at least two similar situations are detected within the configured detection window. If sufficient data is available, the following views are displayed to help with historical analysis:
Aggregated View: Displays when similar situations exist for at least one week.
Detailed View: Displays when similar situations are available for a month or more.
Example
The payment portal service experiences recurring high memory usage every Friday evening during peak transaction hours.
Each occurrence triggers a new situation due to memory threshold breaches on the same service node. Although these situations happen on different days, they are similar in nature, with same impacted service, same event pattern, and same root cause (memory spike due to peak-hour usage).
BMC Helix AIOpsdetects this recurring pattern using ML and groups these into a similar situations group.
In the Similar Situations section, view the first and the most recent occurrences of similar situations.
To analyze the pattern and trend of similar situations associated with the same service node, use Aggregated View or Detailed View:
Aggregated View: Shows the occurrences of similar situations against the days of the week. The Y-axis represents the days of the week starting from Sunday to Saturday. The X-axis shows the hourly time slot for 24 hours.
Detailed View: Shows the occurrences of similar situations in the last 30 days.
The Y-axis represents the day and date, and the X-axis represents the hourly time slot for 24 hours. This view captures data between the first and the most recent occurrence for the last 30 days. The detailed view is displayed even if the data is available for a single day.
Click Show More to view similar situation details such as the time of occurrence, number of related events, type, severity, priority, status, and incident ID.
Clicking the Incident ID link opens the incident in BMC Helix IT Service Management (requires a subscription to BMC Helix IT Service Management).(Optional) Click the action menu
to perform actions on a situation.
For more information, see Performing-situation-actions.Continue with To view situation explanation.
To view the situation explanation
In the Situation Explanation section, use the Root Cause View to analyze the root cause events associated with the situation.
Root Cause View: Shows the impact flow of events in a situation in a graphical format.
Based on the temporal and topological relationships between various causal events in the situation, the ML algorithm determines the root cause event and consequent events. Each event in the graph is aligned against the corresponding CI kind. The direction in the graph indicates the impact flow from the root cause event. You can see the impact score percentage displayed with the event. The total impact score from all the events adds up to 100 percent.Show root cause candidates: Select this option to display all configuration items (CIs) that are identified as contributing to the impact of the situation. When enabled, multiple root causes are highlighted that contribute to the issue. If this option is cleared, only the most probable root cause is shown. Use this option when dealing with complex issues where multiple components might be failing or affecting each other, leading to service issues.
Hover over an event to view the impacted node details and the corresponding CI or CI kind highlighted in the CI topology and analysis section.
Click an event to view additional details on the Situation Details pane.
Events: Displays all causal events and details such as the event messages, impacted host, occurrence time, impact score, severity, priority, status, and incident ID. Perform actions on a situation by clicking the action menu
.
If Proactive Service Resolution is enabled in BMC Helix Intelligent Automation, the same incident ID is displayed against the situation and the events.- Changes: Displays the top three changes and details such as the change ID, summary, impacted host, occurrence time, status, priority, and impact.
Click an event message to view the following details in the Event Details pane:
Event name, event score, severity, priority, status, and the More Details link to view the additional event details in BMC Helix Operations Management
Event assignee details
Date when the event first occurred or was last modified
Event summary showing the Class, Incident ID, Object Class, Object, and Host. Clicking the Incident ID link opens the incident in BMC Helix IT Service Management.
For more information about event classes and objects, see EVENT base event class.Logs and notes history: All logs and notes for an event are displayed. Type a note in the text box and click Add Note to add any additional notes related to the event. Any note added for the event is reflected in the event in BMC Helix Operations Management.
Performance view: If the slot value for the event class is Alarm, the time-series data collected from the key attributes of the causal events of ML-based situations is displayed.
Click the action menu
to see Performing-situation-actions.
Continue with To view CI topology and analysis.
To view the CI topology and analysis
In the CI topology and analysis map, view the topology map of the situation, the impacted CIs, and the probability of the impact on the connected CIs.
Use the following options to view the map based on your requirements:
Views: Switch between the Organic and Hierarchic view to view the impact flow.
In the organic view, nodes are placed close to their adjacent nodes, thus saving space. While in the hierarchic view, the nodes are distributed into layers, which facilitates the identification of dependencies and relationships among the nodes.Grouping: Click Enable Grouping by CI Kind to view the topology map grouped by the CI kind.
Search: If there are many CIs in the hierarchy, use the search box to locate a particular CI.
Legend: Click to view the legends used for the topological map.
Use the other tools to zoom in, zoom out, or view the map on a full screen.
The right-hand pane provides a summary of the configuration items (CIs) related to the situation. It includes the following fields:
Name: The unique name or identifier of the CI. This helps you recognize the specific component involved in the situation (e.g., BMC Pune, BMC Phoenix, BMC USA).
Kind: Indicates the type of CI, such as business service, host, software pod, or other infrastructure or application components. This classification helps you understand the CI’s function within the service topology.
Location: Expand the CI name to see the physical or logical location of the CI, such as data center, region, or availability zone. In this example, all services are located in Amsterdam (AMS1). This information helps in identifying location-based issues or regional dependencies.
Where to go from here
To perform additional actions on a situation or on the events included in the situation, see Performing situation actions.