This documentation supports the releases of BMC Helix Operations Management up to December 31, 2021.To view the documentation for the latest version, select 23.1 from the Product version picker.

Event correlation for aggregating related events


Aggregate related events into actionable situations that are displayed on BMC Helix AIOps. Correlated events are also displayed on BMC Helix Operations Management. Detect event patterns by defining correlation conditions and uncover issues that require immediate attention. Eliminate many hours of manual sorting and correlation.

Operators need to analyze, prioritize and triage a large number of events in order to resolve problems. They need a way of quickly managing event storms to detect problems even before business is impacted.

A correlation policy can help reduce the event storm by combining multiple matching events into a single aggregated event.

This policy correlates and then aggregates incoming events based on the:

  • Event selection criteria: Acts as the first filter for selecting events.   
  • Correlation conditions: Conditions specified while creating the correlation policy determine which events must be matched.

 The selected events are aggregated under a single aggregated event on the Monitoring > Events page.


Use cases for correlating events

Use case 1: Suppose due to a host going down, you received numerous events related to various applications set up on the host that went down.

In this scenario, you can create a correlation policy to aggregate all the events with the same host name. 

Use case 2: Suppose you received various events related to a failed login. A failed login can indicate unauthorized users trying to access the system.

In this scenario, you can create a correlation policy to aggregate all the events:

  • Originating from the particular host or source address 
  • With the message containing the string "failed login." 

You can generate a new aggregated event with critical severity and high priority to prevent a security breach.


Correlation benefits

Event correlation can aggregate related events to reduce event noise. It also reduces the operator's mean-time-to-detect or discover (MTTD) and the time required for investigating tickets. 

The following images compare how aggregation reduces event noise for a specific use case.

Before aggregation

The following image shows a large number of events received from various event sources in a time window of 15 minutes on a certain day. An operator would need to analyze and prioritize each of these events.

Before aggregation.png

After aggregation

Suppose, you suspect a problem pattern related to the source. You can create a correlation policy to aggregate events based on the host name.

The following image shows a reduced number of events from various event sources in a time window of 15 minutes after the correlation policy is created. The aggregated events reduce event noise and help the operator focus only on those events that really matter.

After aggregation.png


Correlation conditions

Incoming events are correlated based on the following conditions:

  • Matching criteria: The process of building the condition for matching criteria is similar to building the event selection criteria. You can add a condition to find new incoming events that match existing events of interest based on slot values. For example, you can find incoming events with a host name that is same as the host name of one or more existing events. Thus, all incoming events with the matching host name can be correlated and aggregated into a single event. 
    While building the condition, you can specify slots prefixed with $NEW and $OLD. Slots prefixed with ‘$NEW’ refer to slots of incoming events and slots prefixed with ‘$OLD’ refer to slots of existing events. 
  • Correlation time (in minutes): Correlation happens for the time duration specified in minutes. The time calculation begins after the correlation policy is created. After the time window passes, correlation stops. 
  • Minimum event count: When the minimum count specified for matching events is met, correlation begins. 

These conditions can be defined while configuring an event policy with the type, Correlation. For more information, see Creating and enabling event policies vNov_2021.

Important

When you edit a correlation policy to change the policy description or correlation settings and save the policy, the correlation time window and minimum event count are reset for correlating incoming events.


How aggregated events are displayed

The new aggregated event (primary event) is generated with the event details specified while creating the correlation policy. 

The following image displays an example of how an aggregated event appears on the Events page.

correlated event example.png

The individual matching events used for aggregation are no longer displayed on the Events page. Instead, these events are displayed as related events (secondary events) in the event details of the primary event.

You can access the related events by clicking the number displayed in the aggregated event message. The number represents the count of events that were combined to form the new aggregated event. This count can increase as the matching events increase, until the correlation time window lapses. If an aggregated event closes within the time window, a new aggregated event is generated for matching events and displayed on the Events page. If you close the primary event either through the close event operation or from the event policy, the secondary events close after a delay of 10 minutes. Similarly, if you close secondary events either through the close event operation or from the event policy, the primary event closes after a delay of 10 minutes.

The aggregated events are also displayed as situations on the BMC Helix AIOps console. Also, the event status is synced into BMC Helix AIOps. For more information, see Understanding situations

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*