Default language.

Event correlation for aggregating related events


Aggregate related events into actionable situations that are displayed on . Correlated events are also displayed on BMC Helix Operations Management. Detect event patterns by defining correlation conditions and uncover issues that require immediate attention. Eliminate many hours of manual sorting and correlation.

Operators need to analyze, prioritize and triage a large number of events in order to resolve problems. They need a way of quickly managing event storms to detect problems even before business is impacted.

A correlation policy can help reduce the event storm by combining multiple matching events into a single aggregated event.

This policy correlates and then aggregates incoming events based on the following conditions:

  • Event selection criteria: Acts as the first filter for selecting events.   
  • Correlation conditions: Conditions specified while creating the correlation policy determine which events must be matched.

 The selected events are aggregated under a single aggregated event on the Monitoring > Events page.

Important

Aggregated events (primary events) generated by the correlation policy are not suppressed even if the event matches the conditions or event selection criteria configured in the suppression policy.


Scenarios for correlating events

Scenario 1: Suppose due to a host going down, you received numerous events related to various applications set up on the host that went down.

In this scenario, you can create a correlation policy to aggregate all the events with the same host name. 

Scenario 2: Suppose you received various events related to a failed login. A failed login can indicate unauthorized users trying to access the system.

In this scenario, you can create a correlation policy to aggregate all the events:

  • Originating from the particular host or source address 
  • With the message containing the string "failed login." 

You can generate a new aggregated event with critical severity and high priority to prevent a security breach.


Correlation benefits

Event correlation can aggregate related events to reduce event noise. It also reduces the operator's mean-time-to-detect or discover (MTTD) and the time required for investigating tickets. 

The following images compare how aggregation reduces event noise for a specific use case.

Before aggregation

The following image shows a large number of events received from various event sources in a time window of 15 minutes on a certain day. An operator would need to analyze and prioritize each of these events.

Before aggregation.png

After aggregation

Suppose, you suspect a problem pattern related to the source. You can create a correlation policy to aggregate events based on the host name.

The following image shows a reduced number of events from various event sources in a time window of 15 minutes after the correlation policy is created. The aggregated events reduce event noise and help the operator focus only on those events that really matter.

After aggregation.png


Correlation settings

Incoming events are correlated based on the following conditions:

  • Matching criteria: The process of building the condition for matching criteria is similar to building the event selection criteria. You can add a condition to find new incoming events that match existing events of interest based on slot values. For example, you can find incoming events with a host name that is same as the host name of one or more existing events. Thus, all incoming events with the matching host name can be correlated and aggregated into a single event. 
    While building the condition, you can specify slots prefixed with $NEW and $OLD. Slots prefixed with ‘$NEW’ refer to slots of incoming events and slots prefixed with ‘$OLD’ refer to slots of existing events. 
    Excerpt named Custom_slot_display_selection_criteria was not found in document xwiki:IT-Operations-Management.Operations-Management.BMC-Helix-Operations-Management.bhom252.Monitoring-events-and-reducing-event-noise.Monitoring-and-managing-events.Filtering-events.WebHome.

    Best practice
    If you specify the minimum event count as 1, we recommend that you compare incoming event values with static values in the event selection criteria instead of the correlation criteria. This approach prevents incorrect event correlation.

    For example, specify the following condition in the event selection criteria instead of the correlation criteria:

    $NEW.severity_var Equals LOW

  • Correlation time (in minutes): Correlation happens for the time duration specified in minutes. The time calculation begins after the correlation policy is created. After the time window passes, correlation stops. You can specify a correlation time window ranging from 2 minutes to 720 minutes.
  • Minimum event count: When the minimum count specified for matching events is met, correlation begins. You can specify a maximum of 100 events to correlate events.
  • Generate new aggregated event: Existing events that satisfy the matching criteria and the minimum event count for the correlation time are aggregated into a single event. Specify the following inputs for the aggregated event:

    Input

    Description

    Event Class

    Event class for the aggregated event.

    Default: Alarm

    Important: You cannot generate aggregated events for the following event classes:

    • Anomaly
    • Prediction
    • Situation

    What happens to policies that use these restricted classes?

    Existing policies that use the Anomaly, Prediction, and Situation classes are not impacted. However, if you update an existing policy, the policy fails because these restricted classes are not allowed.

    You must remove these restricted classes when updating an existing policy.

    Event Severity

    Severity for the aggregated event.

    If you select the Match event severity option, the aggregated event severity is assigned as the severity of the first secondary event that matches the correlation criteria.

    Default: Minor

    Event Priority

    Priority for the aggregated event.

    If you select the Match event priority option, the aggregated event priority is assigned as the priority of the first secondary event that matches the correlation criteria.

    Default: Lowest

    Event Status

    Status for the aggregated event.

    If you select the Match event status option, the aggregated event status is assigned as the status of the first secondary event that matches the correlation criteria.

    Default: Open

    Message

    Message that you want to include in the aggregated event. Use the % character to include slot placeholders. When the policy is applied, the placeholders are replaced with values from the incoming event.

    Location

    Location for the aggregated event.

These conditions can be defined while configuring an event policy with the type, Correlation. For more information, see Creating-and-enabling-event-policies.

Important

When you edit a correlation policy to change the policy description or correlation settings and save the policy, the correlation time window and minimum event count are reset for correlating incoming events.


How aggregated events are displayed

The new aggregated event (primary event) is generated with the event details specified while creating the correlation policy. 

The following image displays an example of how an aggregated event appears on the Events page.

correlated event example.png

The individual matching events used for aggregation are no longer displayed on the Events page. Instead, these events are displayed as related events (secondary events) in the event details of the primary event.

You can access the related events by clicking the number displayed in the aggregated event message. The number represents the count of events that were combined to form the new aggregated event. This count can increase as the matching events increase, until the correlation time window lapses. If an aggregated event closes within the time window, a new aggregated event is generated for matching events and displayed on the Events page. If you close the primary event either through the close event operation or from the event policy, the secondary events close after a delay of 10 minutes. Similarly, if you close secondary events either through the close event operation or from the event policy, the primary event closes after a delay of 10 minutes. If you delete a related (secondary) event, the event count in the aggregated (primary) event does not change. The system processes aggregated events as new events by using the event processing phases. 

If you delete aggregated (primary) events that are closed, you can view related (secondary) events that are closed by searching them as they are not directly visible on the Events page.

Important

If you want to close and delete all secondary events and want the primary event to auto-close, perform one of the following actions:

  • Close all secondary events, wait for 10 minutes, and then delete the secondary event.
  • Close the primary event directly and delete the secondary event.

The aggregated events are also displayed as situations on the  console. Also, the event status is synced into . For more information, see Monitoring and investigating situations.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*