Investigating ML-based independent situations

Related topics

As an operator or a site reliability engineer (SRE), use independent situations to:

Investigate isolated anomalies or service issues without the noise of correlated data.
Quickly assess the impact by reviewing severity, priority, and the affected CI or service.
Perform root cause analysis by drilling into the CI’s performance metrics, logs, and events.
Identify and address recurring issues by reviewing historical data and patterns for the same CI.
Trigger appropriate remediation actions such as automation or incident creation.
Identify cross-site network issues with BGP-based situations that group events from BGP-connected routers and services to highlight shared infrastructure problems.

An independent situation is an ML-based situation formed by correlating events within a single service, based on event patterns, signatures, anomalies, and topological connections between event nodes. It reflects issues that are limited to one service and do not involve dependencies on other services.

Independent situations allow operators or SREs to focus on targeted troubleshooting and resolution. They are especially useful for investigating one-off issues or monitoring anomalies that have not yet shown broader impact across the environment.

Example 1

A spike in CPU usage is detected on a login service server due to a configuration update.

This anomaly triggers a situation that affects only the login service node. There are no related issues detected in other services, and the situation is not linked to any shared CI or topological dependency. The situation is thus treated as an independent situation, not grouped into any primary, multiservice, or similar category.

Example 2

Two routers, Router_A and Router_B, are located in the Pune data center, but belong to different autonomous systems (AS1 and AS2).

These routers establish a BGP session to exchange routing information across internal network domains.

Suddenly, the BGP session between Router_A and Router_B goes down, triggering events on both routers.

BMC Helix AIOpsreceives events and identifies them as BGP events generated due to BGP failure. Based on the discovered BGP IP-to-IP connectivity, it recognizes that these events are related, even though the routers are on the same site. A BGP-based situation is formed, correlating the two events and linking them through the discovered BGP relationship between the IPs. The situation identifies the broken BGP session as the root cause.

The BGP-based situation correlation feature is available under controlled availability. To use this capability, contact BMC Support.

To investigate an independent situation

Click an open, single-service situation and view the following details:
- Situation name, severity, priority, incident ID, status, and the name of the impacted service.
  If Proactive Service Resolution is enabled in BMC Helix Intelligent Automation, the same incident ID is displayed against the situation and the events. If an incident is not created, the Create Incident option is displayed.
  When Intelligent Automation Proactive Service Resolution is configured with on-premises ITSM, incidents are created in on-premises ITSM, and the incident ID is displayed in the situation details in BMC Helix AIOps. However, the incident ID cross-launch link from BMC Helix AIOps does not open the corresponding on-premises ITSM incident.
- Show Notes: Opens the Logs and Notes panel to display the notes added to a situation.
- Situation highlight: Number of events from the top three impacted hosts.
- Similar situations (If more than one similar situation is available).
- Situation explanation
- CI topology and analysis
- If BMC HelixGPT is enabled:
  - A human-readable AI-generated summary of the situation.
  - Best action recommendations, a list of suggested steps that can be used to remediate the situation. Additionally, a BMC HelixGPT-driven wizard offers sample code to accomplish individual steps in different languages such as Ansible, Python, and Bash.
  - Log insights collected from logs generated in BMC Helix Log Analytics, that help in getting an accurate root cause of the problem.
  - An integrated virtual agent, Ask HelixGPT, which leverages the BMC HelixGPT generative AI capabilities and helps you to ask questions to investigate and remediate the situation better. To learn more about BMC HelixGPT capabilities, see Situations-overview.
Continue with To view best action recommendations.

To view best action recommendations

BMC Helix AIOps with BMC HelixGPT enables you to connect to the following ITSM data sources to generate best action recommendations:

BMC Helix IT Service Management
ServiceNow BMC Helix IT Service Management (Controlled availability customers only)
Jira (Controlled availability customers only)

Administrators must configure third-party data sources in BMC HelixGPT to generate recommendations. To configure third-party data sources, see Adding data sources in BMC HelixGPT in BMC HelixGPT documentation.

On the situation details page, review an AI-generated summary (short problem statement, brief summary, and detailed problem context).
Click Show remediation steps.
The recommended steps are displayed for a situation.
For example, for a High CPU Utilization issue, the following steps are suggested:
(If available) Click Code wizard.
The code that can be used to run the recommended step is displayed. For some manual steps, the code wizard might not be displayed.
1. Select your preferred language (Ansible, Python, Bash), and the code is displayed based on the selected language.
2. Click Copy to clipboard and use the code in your existing script to run the recommended remediation step.
3. Close the code wizard.
Continue with To view log insights.

To view log insights

BMC Helix AIOps with BMC HelixGPT enables you to connect to the following log data sources to generate log insights:

BMC Helix Log Analytics (no configuration required)
Splunk Enterprise
ElasticSearch

Administrators must configure third-party data sources in BMC HelixGPT Manager to generate insights. To configure third-party data sources, see Adding data sources in BMC HelixGPT in BMC HelixGPT documentation.

Click Ask HelixGPT and then click Log Insights. The first time that you view log insights for a situation, a progress bar is displayed to show the progress of the log summary generation. If you view log insights for the same situation again, the summary loads without delay. Depending on the log source configured in BMC HelixGPT, actionable insights from the logs related to the configuration item are displayed. which helps in identifying the root cause of the situation.
Use the cross-launch link to view the log details in BMC Helix Log Analytics.

Important

If a situation has multiple root causes, the BMC HelixGPT Log Insights retrieves logs only for the CI with the highest impact score. As a result, log data for other contributing CIs may not appear in the Log Insights.

Can I configure multiple data sources to generate log insights?

Yes. If you are using third-party log applications as data sources, you can select more than one data source, and BMC Helix AIOps displays insights from all relevant logs for an impacted configuration item.

Can I use the cross-launch link to view the log data even if I am not using BMC Helix Log Analytics?

Yes, the cross-launch link opens the logs page in the log application configured in BMC HelixGPT Manager.

What other log data sources are supported by BMC Helix AIOps?

For a list of supported third-party data sources, see Data-sources-for-BMC-Helix-AIOps.

To use the Ask HelixGPT virtual agent to get more information about the situation

The Ask HelixGPT virtual agent is available if BMC HelixGPT is enabled. To enable BMC HelixGPT, contact BMC Support.

Use the Ask HelixGPT virtual agent to ask questions within the context of an open or past situation. Using the BMC HelixGPT capabilities, operators can get information about diverse topics regarding infrastructure, service health, and near real-time predictions.

Click Ask HelixGPT.
The interactive virtual agent dialog box displays the following predefined questions:
- What is the impact of the issue?
- Which team can solve this issue?
- Has this situation happened in the past?
- Are there any change windows active during this situation?
Click any question to get additional information about the situation.
BMC HelixGPT generates the answer by evaluating information from incidents created for similar situations in BMC Helix IT Service Management, analyzing time stamps and patterns of similar situations that have occurred in the past, analyzing the service health score of the impacted service of the situation, and the change requests associated with the situation.
For example, if you click What is the impact of this issue?, the following answer is displayed.
(Optional) Click any other question to obtain more details about the situation.
Continue with To view similar situations.

To view similar situations

The Similar Situations section appears only when at least two similar situations are detected within the configured detection window. If sufficient data is available, the following views are displayed to help with historical analysis:

Aggregated View: Displays when similar situations exist for at least one week.
Detailed View: Displays when similar situations are available for a month or more.

Example

The payment portal service experiences recurring high memory usage every Friday evening during peak transaction hours.

Each occurrence triggers a new situation due to memory threshold breaches on the same service node. Although these situations happen on different days, they are similar in nature, with same impacted service, same event pattern, and same root cause (memory spike due to peak-hour usage).

BMC Helix AIOpsdetects this recurring pattern using ML and groups these into a similar situations group.

In the Similar Situations section, view the first and the most recent occurrences of similar situations.
To analyze the pattern and trend of similar situations associated with the same service node, use Aggregated View or Detailed View:
- Aggregated View: Shows the occurrences of similar situations against the days of the week. The Y-axis represents the days of the week starting from Sunday to Saturday. The X-axis shows the hourly time slot for 24 hours.
- Detailed View: Shows the occurrences of similar situations in the last 30 days.
  The Y-axis represents the day and date, and the X-axis represents the hourly time slot for 24 hours. This view captures data between the first and the most recent occurrence for the last 30 days. The detailed view is displayed even if the data is available for a single day.
Click Show More to view similar situation details such as the time of occurrence, number of related events, type, severity, priority, status, and incident ID.
Clicking the Incident ID link opens the incident in BMC Helix IT Service Management (requires a subscription to BMC Helix IT Service Management).
(Optional) Click the action menu to perform actions on a situation.
For more information, see Performing-situation-actions.
Continue with To view situation explanation.

To view the situation explanation

In the Situation Explanation section, use the Root Cause View to analyze the root cause events associated with the situation.
- Root Cause View: Shows the impact flow of events in a situation in a graphical format.
  Based on the temporal and topological relationships between various causal events in the situation, the ML algorithm determines the root cause event and consequent events. Each event in the graph is aligned against the corresponding CI kind. The direction in the graph indicates the impact flow from the root cause event. You can see the impact score percentage displayed with the event. The total impact score from all the events adds up to 100 percent.
  - Show root cause candidates: Select this option to display all configuration items (CIs) that are identified as contributing to the impact of the situation. When enabled, multiple root causes are highlighted that contribute to the issue. If this option is cleared, only the most probable root cause is shown. Use this option when dealing with complex issues where multiple components might be failing or affecting each other, leading to service issues.
  - Hover over an event to view the impacted node details and the corresponding CI or CI kind highlighted in the CI topology and analysis section.
  - Click an event to view additional details on the Situation Details pane.
- Events: Displays all causal events and details such as the event messages, impacted host, occurrence time, impact score, severity, priority, status, and incident ID. Perform actions on a situation by clicking the action menu .
  
  If Proactive Service Resolution is enabled in BMC Helix Intelligent Automation, the same incident ID is displayed against the situation and the events.
  Information
  Automated remediation action
  The Automations column displays the matching automation actions for the event. To run automation, see Running-an-existing-automation.
  (Optional) You can send a request to create automation or create automation for events yourself if you have the necessary permissions. For more information, see Requesting-for-a-new-automation and Creating-automation-policies.
- Changes: Displays the change requests that are most likely contributing to the situation, based on correlation with impacted service nodes. Change requests from BMC Helix ITSM are displayed in situations. Change requests from ServiceNow are not supported on the Situations page.
  - For each change, the change ID, summary, impacted host or CI, occurrence time (start date/time), status (e.g., Scheduled, In Progress, Completed), priority, and impact level are shown.
  - Click a change entry to view more details in BMC Helix ITSM, where you can review the entire change history, implementation notes, approvals, and associated incidents.
    While change events do not contribute to service health score calculations, displaying them in context helps operators investigate more effectively.
Click an event message to view the following details in the Event Details pane:
- Event name, event score, severity, priority, status, and the More Details link to view the additional event details in BMC Helix Operations Management
- Event assignee details
- Date when the event first occurred or was last modified
- Event summary showing the Class, Incident ID, Object Class, Object, and Host. Clicking the Incident ID link opens the incident in BMC Helix IT Service Management.
  For more information about event classes and objects, see EVENT base event class.
- Logs and notes history: All logs and notes for an event are displayed. Type a note in the text box and click Add Note to add any additional notes related to the event. Any note added for the event is reflected in the event in BMC Helix Operations Management.
- Performance view: If the slot value for the event class is Alarm, the time-series data collected from the key attributes of the causal events of ML-based situations is displayed.
Click the action menu to see Performing-situation-actions.
Continue with To view CI topology and analysis.

To view the CI topology and analysis

In the CI topology and analysis map, view the topology map of the situation, the impacted CIs, and the probability of the impact on the connected CIs.
Use the following options to view the map based on your requirements:
1. Views: Switch between the Organic and Hierarchic view to view the impact flow.
  In the organic view, nodes are placed close to their adjacent nodes, thus saving space. While in the hierarchic view, the nodes are distributed into layers, which facilitates the identification of dependencies and relationships among the nodes.
2. Grouping: Click Enable Grouping by CI Kind to view the topology map grouped by the CI kind.
3. Search: If there are many CIs in the hierarchy, use the search box to locate a particular CI.
4. Legend: Click to view the legends used for the topological map.
5. Use the other tools to zoom in, zoom out, or view the map on a full screen.
6. The right-hand pane provides a summary of the configuration items (CIs) related to the situation. It includes the following fields:
  - Name: The unique name or identifier of the CI. This helps you recognize the specific component involved in the situation (e.g., BMC Pune, BMC Phoenix, BMC USA).
  - Kind: Indicates the type of CI, such as business service, host, software pod, or other infrastructure or application components. This classification helps you understand the CI’s function within the service topology.
    Location: Expand the CI name to see the physical or logical location of the CI, such as data center, region, or availability zone. In this example, all services are located in Amsterdam (AMS1). This information helps in identifying location-based issues or regional dependencies.

To prioritize the root cause on the same node

When multiple events occur on the same node within a situation, the first detected event might not always represent the true causal event. This can happen when the polling intervals vary across monitoring solutions, causing later events to better represent the actual cause.

To explicitly indicate which event should be considered the priority causal event, you can apply prioritization by using the refinement policy in BMC Helix Operations Management. For information about enrichment policies, see Advanced, time-based, and dynamic enrichment policies

Prioritization can be applied only to events on the same nod Underline Keyboard shortcut Ctrl+Ue within a service. This prioritization applies only to future situations, not to situations that have already been created.

To indicate priority causal events:

Create an event enrichment or refinement policy in BMC Helix Operations Management.
In the policy, set the tags slot of the event to include the priority-causal-root tag.

When an event is marked with this tag and occurs on the causal node, BMC Helix AIOps prioritizes it over other events on the same node.

Example

Consider a node where both application and network monitoring are enabled through different monitoring solutions.

The application monitoring detects a slowdown first and raises an event.
A few seconds later, the network monitoring raises an event indicating that a critical network port on the node is down, which actually represents the cause of the slowdown.

Without prioritization, the application event may be incorrectly treated as the root cause. By tagging the network event with the priority-causal-root tag through a refinement policy, the port issue detected by network monitoring is correctly prioritized as the causal event.

FAQs

How can I exclude specific events from a Situation?

To exclude specific events from being included in the ML-based situations, you can use event enrichment policies to tag them appropriately.

Create an event enrichment or refinement policy in BMC Helix Operations Management.
In the policy, set the tags slot of the event to include the ExcludeFromSituation tag.
When this tag is applied, BMC Helix AIOps automatically excludes the event from being correlated into any ML-based situation.
Verify the exclusion by checking the event's tags slot and confirming that it does not appear in any active situation.

You can refine event inclusion logic by using the advanced enrichment or refinement policies. For information about enrichment policies, see Advanced, time-based, and dynamic enrichment policies

For example, exclude events from known noisy CIs or event sources.

Why must a situation causal event have a single unique node ID?

When a situation is created and noise consolidation is enabled, the causal event must be associated with only one unique node ID. If multiple node IDs exist, it might not be possible to determine which node to associate with the incident, and as a result, the incident might not be created in BMC Helix IT Service Management.

Where to go from here

To perform additional actions on a situation or on the events included in the situation, see Performing situation actions.

Investigating ML-based independent situations

To investigate an independent situation

To view best action recommendations

To view log insights

To use the Ask HelixGPT virtual agent to get more information about the situation

To view similar situations

To view the situation explanation

To view the CI topology and analysis

To prioritize the root cause on the same node

Example

FAQs

Where to go from here

BMC Helix AIOps 25.3

On this page