Situations overview


BMC Helix AIOps correlates events collected from multiple sources into situations. These sources could be infrastructure, applications, network resources, and monitoring solutions from different vendors. The events from the same or different hosts are correlated based on their occurrence, message, signature, topology, or a combination of these factors. 

Watch the following video (2:58) to get an overview of ML-based situations:

icon_play.pngWatch the YouTube video about Overview of ML-based situations.

Situations are generated by correlating events and, in some cases, existing situations based on a combination of intelligent analysis and machine learning. Some of the key factors considered during this correlation include:

  • Problem patterns and event signatures

  • Event signature, which is derived from the event message and the name of the host where the event originated

  • Content analysis of the event message to detect known issues or anomalies

  • Sequence and timing of events occurring on the same node or across multiple nodes within a service hierarchy

  • Topological relationships between CIs, such as BGP-based connections between routers across sites

  • Temporal factors, such as the time of day or day of the week when the events occur

Based on the type of correlation, situations are categorized into independent, multi-service, and primary situations. 

The following figure shows how situations are created in BMC Helix AIOps:

situations_overview.png

Types of situations

Policy-based situations: Events that are correlated and aggregated based on predefined event correlation policies created in BMC Helix Operations Management. On the Situations page, the list ​​​​​icon​​​​​ indicates a policy-based situation.

For more information, see Policy-based situations.

ML-based situations: Events that are automatically correlated and aggregated using AI/ML algorithms are known as ML-based situations. These situations are based on topological relationships, time proximity, event patterns, and severity. Based on the type of correlation, ML-based situations are further categorized into independent, multi-service, and primary situations.

Prerequisites for viewing ML-based situations

Before you view ML-based situations, make sure that the following prerequisites are met:

  • A service model is created either in BMC Helix AIOps or BMC Helix Discovery. 
    For more information, see Building service models
  • BMC Helix AIOpsis able to receive events from the required data sources.
    For information about the sources from which BMC Helix AIOps​​​​​​ can receive events, see Data sources for BMC Helix AIOps.
  • The AIOps Situations feature is enabled in BMC Helix AIOps.
    For more information, see Enabling the AIOps features.
  • Situations are formed according to various settings available on the Manage Situations > Configurations page. Update the default settings if required.

For more information about the settings, see Configuring ML-based situations.

Naming convention for ML-based situations

The name of an ML-based situation is derived from the associated service name and the message of the situation event with the highest impact score.

The situation name is displayed in the following format: <Name of the service> - <The message of a situation event with the highest impact score>.

For example, Railway_Ticketing_Service - Memory utilization is > 80% for 5 mins.

On the Situations page, you can view All Situations and Similar Situations. The All Situations section lists independent, multi-service, and primary situations.

ml-based situations.gif   Independent situations

An independent situation is an ML-based situation formed by correlating events within a single service, based on event patterns, signatures, anomalies, and topological connections between event nodes. It reflects issues that are limited to one service and do not involve dependencies on other services.

sit_overview_independent_situations_253.png

ml-based situations.gif  multiservice_situations_icon1.gif  Multi-service situations

A multi-service situation is an ML-based situation that is formed by correlating events from multiple services that are topologically connected and impact each other. It includes events from both the defined service context and all external nodes that have a topological relationship and impact on the service. 

Multi-service situations follow the direction of impact, linking services affected by a common issue in shared infrastructure. These situations can be created with just one event on each service, as long as the services are topologically connected through a shared infrastructure component.

BMC Helix AIOpsuses AI/ML algorithms to correlate these events based on topological relationships and shared infrastructure. This correlation enables faster root cause identification, improved visibility, and reduced Mean Time to Resolution (MTTR).  

sit_overview_multiservice_situations_253.png

The Multi-service situations feature is under controlled availability to select customers. To use this capability, contact BMC Support.

ml-based situations.gif Parent Situation - light mode.gif  Primary situations

Primary situations reduce noise by grouping similar open situations that occurred due to a similar issue and impacted multiple services across a service hierarchy. These situations are created when multiple independent situations within a service hierarchy are linked through common or connected nodes, regardless of impact direction.

For a primary situations to get created, at least two events are required on each participating service. These are then grouped into a primary situation based on topological links.

Instead of troubleshooting each service and its situation separately, operators or SREs can troubleshoot the primary situation. BMC Helix AIOps leverages AI/ML algorithms to find situation similarity due to temporal, topological, or knowledge graph relationships. This information helps reduce the event noise and improves MTTR. 

Primary situations are formed based on the following options defined on the Manage Situations page under Configurations

  • Correlation Event Time Window (in mins): Configure the time limit to determine whether a correlation event will be a part of a situation.
  • Situation Stability Window (in mins): Configure the time limit to add new correlation events to a situation.

sit_overview_primary_situations_253.png

ml-based situations.gif  Similar situations

Groups of similar situations from the same or different services, based on similarity of the causal nodes, are listed in the Similar Situations section.

BMC Helix AIOpsuses AI/ML algorithms to group situations of a similar nature based on their repeated impact on a service in the past. Operators or SREs can perform historical analysis on problems, look at the number of incidents raised, automation runs, the severity of situations, and the time of past occurrences, and take meaningful actions.

Similar situations also help in faster root cause isolation. For any open situation, similar situations provide much-needed context to understand how a similar situation was resolved in the past, the actions taken to resolve it, and the root cause that was identified. Based on this contextual information, operators or SREs can perform similar actions to diagnose and remediate the situation and therefore reduce MTTR.

Similar situations are formed based on the following options defined on the Manage Situations page under Configurations

  • Expiry of Similar Situation Group (in days): Configure the maximum number of days a group of similar situations can remain idle before expiring.
  • S​​​imilar Situation Detection Window (in hours): Configure the hourly interval for detecting similar situations to run and form groups.

sit_overview_similar_situations_253.png

Differences between situation types

The following table summarizes the key characteristics of each situation type in BMC Helix AIOps to help you understand how they are triggered, correlated, and visualized in the UI.

Situation type

Trigger scope

When does it occur?

Benefit

Independent

Triggered by correlating events within a service.

Occurs when events from a single service model are correlated based on topological connections within that service, and all involved nodes are explicitly part of the same service definition.Help focus on issues that are limited to a single service.

Multi-service

Triggered by correlating events impacting multiple topologically connected services.

Occurs when events from different service models are correlated based on shared infrastructure (like a database or router), even if the node is not explicitly included in all services.

Helps identify cross-service dependencies and root causes that impact several services at once.

Primary

Multiple open situations across different services or nodes. All the involved situations must be linked through shared CIs or relationships.

Groups related open situations that impact multiple services and involve shared or related configuration items. All situations must share CIs or relationships.

Reduces alert noise by grouping similar situations, enabling operators to analyze one root issue instead of many.

Similar

Situations that resemble past ones on the same or different services, based on the similarity of the causal nodes.

Formed by analyzing past situations for recurring patterns, causes, and resolutions on the same node.

Not based on current topology but on history.

Provides resolution context from past occurrences; helps repeat successful fixes and reduces MTTR.

BMC HelixGPT-based summary, best action recommendations, log insights, and a virtual agent – Ask HelixGPT

The following video (5:05) provides an overview of the transformative power of BMC Helix AIOps and observability enhanced by BMC HelixGPT: 

icon_play.pngWatch the YouTube video about Unlocking AIOps and Observability with BMC HelixGPT.

BMC Helix AIOps connects with BMC HelixGPT to leverage the generative AI capabilities that help operators or SREs understand the root cause of a situation faster by providing a human-readable AI-generated situation summary. This summary gives a synopsis of a short problem classification, a brief root cause summary, and a causal summary explaining the complete context. 

bar_nwbandwidth_situations_summary_24.1.02.png

Best action recommendations

By using the generative AI capabilities, BMC HelixGPT provides a step-by-step action plan for remediating the situation. These remediation steps are called best action recommendations and can be used by the operators or SREs to resolve the events. Best action recommendations help close situations faster and improve the mean time to resolve (MTTR).

BMC HelixGPT generates these recommendations by evaluating information from the following sources:

  • If similar situations occurred in the past, BMC HelixGPT looks for the resolution notes that might have been added during the closing of incidents in BMC Helix IT Service Management for these similar situations.
  • If no similar situations are found, BMC HelixGPT evaluates the resolution notes added in incidents of related events.
  • If no incidents are available, event messages of the root cause events are evaluated, and recommendations are suggested.

BMC Helix AIOps can connect with BMC Helix IT Service Management, Jira Service Management, or  ServiceNow ITSM through BMC HelixGPT to generate the best action recommendations based on the incidents in these supported incident management systems. For more information about configuring incident sources through agents in BMC HelixGPT, see Adding agents for BMC Helix AIOps

With the remediation steps, a code wizard provides sample scripts that can be used for performing the recommended step in Ansible, Python, and Bash. For example, if a situation is created for a network bandwidth utilization issue identified on a host, one of the recommended actions could be to increase the disk size space. The code wizard generates a script to increase the network bandwidth, which can be used to implement automation and resolve similar issues in the future.  

CodeW2_BAR_IncreaseCPU_HelixGPT_24102.png

Log insights

BMC Helix AIOps connects with the supported log repositories through BMC HelixGPT to analyze, summarize, and derive actionable insights from the logs related a service. Apart from connecting with BMC HelixGPT, BMC Helix AIOps also connects with external log data sources such as Splunk Enterprise and ElasticSearch to leverage your existing log repositories. For more information about configuring the data sources, see Adding data sources in BMC HelixGPT

The actionable insights are displayed under the Log Insights option and can be used to identify the root cause of situations. Operators or SREs can use the cross-launch link to view the detailed logs in the supported log repositories. 

Ask HelixGPT

An integrated virtual agent, Ask HelixGPT, leverages the BMC HelixGPT generative AI capabilities to answer the following predefined questions about a situation: 

  • Recurring?
  • Logs?
  • Change windows?
  • Impact?
  • Team to solve? 

Ask HelixGPT questions

BMC HelixGPT generates answers to these questions by evaluating information from the incidents created for similar situations in BMC Helix IT Service Management, analyzing timestamps and patterns of similar situations that have occurred in the past, analyzing the service health score of the impacted service of the situation, and the change requests associated with the situation.

By using these BMC HelixGPT capabilities, operators can improve operational efficiency, derive insights from all connected sources, and reduce manual errors by implementing automation to resolve problems faster.

Incident management for situations

BMC Helix AIOps connects with BMC Helix IT Service Management to show incidents created for situations. If the Proactive Service Resolution is configured in BMC Helix Intelligent Automation, instead of separate incidents for a situation and its events, a consolidated incident is created for a situation. For more information, see Overview of Proactive Service Resolution

Where to go from here

To view the primary, independent, and similar ML-based situations, see Monitoring-situations

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*