Situations overview


BMC Helix AIOpscorrelates events collected from multiple sources into situations. These sources could be infrastructure, applications, network resources, and monitoring solutions from different vendors. The events from the same or different hosts are correlated based on their occurrence, message, signature, topology, or a combination of these factors. 

Watch the following video (2:58) to get an overview of ML-based situations:

icon_play.pngWatch the YouTube video about Overview of ML-based situations.

Situations are formed by correlating events and other situations based on some of the following factors:

  • Problem pattern and event signatures.
    An event's signature is derived from the name of the host from which the event originated and the event message.
  • Analysis of the event message 
  • Sequence of the events from the same node or different nodes in a service hierarchy
  • Time and day of occurrence of the events

Based on the type of correlation, situations are categorized into independent, primary, or similar situations.

The following figure shows how situations are created in BMC Helix AIOps:

situations_workflow (1).png

Types of situations

Events that are correlated and aggregated based on the event correlation policies created in BMC Helix Operations Management are known as policy-based situations. For more information, see Policy-based situations.

Events that are correlated and aggregated based on the AI/ML algorithms are known as ML-based situations. 

On the Situations page, this listPolicy based situation icon_24.102.png icon indicates that it’s a policy-based situation and this ML-based ML based situaion icon_24102.pngicon indicates an ML-based situation.

Naming convention for ML-based situations

The name of an ML-based situation is derived from the associated service name and the message of the situation event with the highest impact score.

The situation name is displayed in the following format: <Name of the service> - <The message of a situation event with the highest impact score>.

For example, Railway_Ticketing_Service - Memory utilization is > 80% for 5 mins.

Prerequisites to viewing ML-based situations

Before you view ML-based situations, make sure that the following prerequisites are met:

  • A service model is created either in BMC Helix AIOps or BMC Helix Discovery. 
    For more information, see Building service models
  • BMC Helix AIOps is able to receive events from the required data sources.
    For information about the sources from where BMC Helix AIOps can receive events, see Data sources for BMC Helix AIOps.
  • The AIOps Situations feature in enabled in BMC Helix AIOps.
    For more information, see Enabling the AIOps features.
  • Situations are formed according to various settings available on the Manage Situations > Configurations page. Update the default settings if required.
    For more information about the settings, see Configuring ML-based situations.

Independent, primary, and similar situations

Based on the type of correlation, situations are categorized into independent, primary, or similar situations. 

On the Situations page, you can view:

  • All Situations: Lists the independent situations, groups of all primary situations, and similar situations. An independent situation is an ML-based situation that does not belong to any grouped situation. Primary situations reduce noise by grouping similar open situations that occurred due to a similar issue and impacted multiple services across a service hierarchy. Instead of troubleshooting each service and its situation separately, operators or site reliability engineers (SREs) can troubleshoot the primary situation, indicated by thisPrimary Situation.png icon. BMC Helix AIOps leverages AI/ML algorithms to find situation similarity due to temporal, topological, or knowledge graph relationships. This information helps reduce the event noise and improves MTTR. 

    Primary situations are formed based on the following options defined on the Manage Situations page under Configurations
    • Correlation Event Time Window (in mins): Configure the time limit to determine whether a correlation event will be a part of a situation.
    • Situation Stability Window (in mins): Configure the time limit to add new correlation events to a situation.
  • Similar Situations: Lists groups of all similar situations from the same service node. BMC Helix AIOps uses AI/ML algorithms to group situations of a similar nature based on their repeated impact on a service in the past. Operators or SREs can perform historical analysis on problems, look at the number of incidents raised, automation runs, the severity of situations, and time of past occurrences, and take meaningful actions.  Similar situations also help in faster root cause isolation. For any open situation, similar situations provide much-needed context to understand how a similar situation was resolved in the past, the actions taken to resolve it, and the root cause that was identified. Based on this contextual information, operators or SREs can perform similar actions to diagnose and remediate the situation and therefore reduce MTTR.
    Similar situations are formed based on the following options defined on the Manage Situations page under Configurations
    • Expiry of Similar Situation Group (in days): Configure the maximum number of days a group of similar situations can remain idle, before expiring.
    • Similar Situation Detection Window (in hours): Configure the hourly interval for detecting similar situations to run and form groups.

BMC HelixGPT-based summary, best action recommendations, log insights, and a virtual agent – Ask HelixGPT

The following video (5:05) provides an overview of the transformative power of BMC Helix AIOps and observability enhanced by BMC HelixGPT: 

icon_play.pngWatch the YouTube video about Unlocking AIOps and Observability with BMC HelixGPT.

BMC Helix AIOps connects with BMC HelixGPT to leverage the generative AI capabilities that help operators or SREs understand the root cause of a situation faster by providing a human-readable AI-generated situation summary. This summary gives a synopsis of a short problem classification, a brief root cause summary, and a causal summary explaining the complete context. 

bar_nwbandwidth_situations_summary_24.1.02.png

Best action recommendations

By using the generative AI capabilities, BMC HelixGPT provides a step-by-step action plan for remediating the situation. These remediation steps are called best action recommendations and can be used by the operators or SREs to resolve the events. Best action recommendations help close situations faster and improve the mean time to resolve (MTTR).

BMC HelixGPT generates these recommendations by evaluating information from the following sources:

  • If similar situations occurred in the past, BMC HelixGPT looks for the resolution notes that might have been added during the closing of incidents in for these similar situations.
  • If no similar situations are found, BMC HelixGPT evaluates the resolution notes added in incidents of related events.
  • If no incidents are available, event messages of the root cause events are evaluated, and recommendations are suggested.

BMC Helix AIOps can connect with , Jira Service Management, or  ServiceNow ITSM through BMC HelixGPT to generate the best action recommendations based on the incidents in these supported incident management systems. For more information about configuring incident sources through agents in BMC HelixGPT, see Adding agents for BMC Helix AIOps

With the remediation steps, a code wizard provides sample scripts that can be used for performing the recommended step in Ansible, Python, and Bash. For example, if a situation is created for a network bandwidth utilization issue identified on a host, one of the recommended actions could be to increase the disk size space. The code wizard generates a script to increase the network bandwidth, which can be used to implement automation and resolve similar issues in the future.  

CodeW2_BAR_IncreaseCPU_HelixGPT_24102.png

Log insights

BMC Helix AIOps connects with the supported log repositories through BMC HelixGPT to analyze, summarize, and derive actionable insights from the logs related a service. Apart from connecting with BMC HelixGPT, BMC Helix AIOps also connects with external log data sources such as Splunk Enterprise and ElasticSearch to leverage your existing log repositories. For more information about configuring the data sources, see Adding data sources in BMC HelixGPT

The actionable insights are displayed under the Log Insights option and can be used to identify the root cause of situations. Operators or SREs can use the cross-launch link to view the detailed logs in the supported log repositories. 

Ask HelixGPT

An integrated virtual agent, Ask HelixGPT, leverages the BMC HelixGPT generative AI capabilities to answer the following predefined questions about a situation: 

  • Recurring?
  • Logs?
  • Change windows?
  • Impact?
  • Team to solve? 

Ask HelixGPT questions

BMC HelixGPT generates answers to these questions by evaluating information from the incidents created for similar situations in , analyzing timestamps and patterns of similar situations that have occurred in the past, analyzing the service health score of the impacted service of the situation, and the change requests associated with the situation.

By using these BMC HelixGPT capabilities, operators can improve operational efficiency, derive insights from all connected sources, and reduce manual errors by implementing automation to resolve problems faster.

Incident management for situations

BMC Helix AIOps connects with  to show incidents created for situations. If the Proactive Service Resolution is configured in BMC Helix Intelligent Automation, instead of separate incidents for a situation and its events, a consolidated incident is created for a situation. For more information, see Overview of Proactive Service Resolution

Where to go from here

To view the primary, independent, and similar ML-based situations, see Monitoring-situations

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*