Event correlation for aggregating related events
Aggregate related events into actionable situations that are displayed on . Correlated events are also displayed on BMC Helix Operations Management. Detect event patterns by defining correlation conditions and uncover issues that require immediate attention. Eliminate many hours of manual sorting and correlation.
Operators need to analyze, prioritize and triage a large number of events in order to resolve problems. They need a way of quickly managing event storms to detect problems even before business is impacted.
A correlation policy can help reduce the event storm by combining multiple matching events into a single aggregated event.
This policy correlates and then aggregates incoming events based on the following conditions:
- Event selection criteria: Acts as the first filter for selecting events.
- Correlation conditions: Conditions specified while creating the correlation policy determine which events must be matched.
The selected events are aggregated under a single aggregated event on the Monitoring > Events page.
Scenarios for correlating events
Scenario 1: Suppose due to a host going down, you received numerous events related to various applications set up on the host that went down.
In this scenario, you can create a correlation policy to aggregate all the events with the same host name.
Scenario 2: Suppose you received various events related to a failed login. A failed login can indicate unauthorized users trying to access the system.
In this scenario, you can create a correlation policy to aggregate all the events:
- Originating from the particular host or source address
- With the message containing the string "failed login."
You can generate a new aggregated event with critical severity and high priority to prevent a security breach.
Correlation benefits
Event correlation can aggregate related events to reduce event noise. It also reduces the operator's mean-time-to-detect or discover (MTTD) and the time required for investigating tickets.
The following images compare how aggregation reduces event noise for a specific use case.
Before aggregation
The following image shows a large number of events received from various event sources in a time window of 15 minutes on a certain day. An operator would need to analyze and prioritize each of these events.
After aggregation
Suppose, you suspect a problem pattern related to the source. You can create a correlation policy to aggregate events based on the host name.
The following image shows a reduced number of events from various event sources in a time window of 15 minutes after the correlation policy is created. The aggregated events reduce event noise and help the operator focus only on those events that really matter.
Correlation settings
Incoming events are correlated based on the following conditions:
Matching criteria: The process of building the condition for matching criteria is similar to building the event selection criteria. You can add a condition to find new incoming events that match existing events of interest based on slot values. For example, you can find incoming events with a host name that is same as the host name of one or more existing events. Thus, all incoming events with the matching host name can be correlated and aggregated into a single event.
While building the condition, you can specify slots prefixed with $NEW and $OLD. Slots prefixed with ‘$NEW’ refer to slots of incoming events and slots prefixed with ‘$OLD’ refer to slots of existing events.- Correlation time (in minutes): Correlation happens for the time duration specified in minutes. The time calculation begins after the correlation policy is created. After the time window passes, correlation stops. You can specify a correlation time window ranging from 2 minutes to 720 minutes.
- Minimum event count: When the minimum count specified for matching events is met, correlation begins. You can specify a maximum of 100 events to correlate events.
Generate new aggregated event: Existing events that satisfy the matching criteria and the minimum event count for the correlation time are aggregated into a single event. Specify the following inputs for the aggregated event:
Input
Description
Event Class
Event class for the aggregated event.
Default: Alarm
Important: You cannot generate aggregated events for the following event classes:
- Anomaly
- Prediction
- Situation
What happens to policies that use these restricted classes?
Existing policies that use the Anomaly, Prediction, and Situation classes are not impacted. However, if you update an existing policy, the policy fails because these restricted classes are not allowed.
You must remove these restricted classes when updating an existing policy.
Event Severity
Severity for the aggregated event.
If you select the Match event severity option, the aggregated event severity is assigned as the severity of the first secondary event that matches the correlation criteria.
Default: Minor
Event Priority
Priority for the aggregated event.
If you select the Match event priority option, the aggregated event priority is assigned as the priority of the first secondary event that matches the correlation criteria.
Default: Lowest
Event Status
Status for the aggregated event.
If you select the Match event status option, the aggregated event status is assigned as the status of the first secondary event that matches the correlation criteria.
Default: Open
Message
Message that you want to include in the aggregated event. Use the % character to include slot placeholders. When the policy is applied, the placeholders are replaced with values from the incoming event.
Location
Location for the aggregated event.
These conditions can be defined while configuring an event policy with the type, Correlation. For more information, see Creating-and-enabling-event-policies.
How aggregated events are displayed
The new aggregated event (primary event) is generated with the event details specified while creating the correlation policy.
The following image displays an example of how an aggregated event appears on the Events page.
The individual matching events used for aggregation are no longer displayed on the Events page. Instead, these events are displayed as related events (secondary events) in the event details of the primary event.
You can access the related events by clicking the number displayed in the aggregated event message. The number represents the count of events that were combined to form the new aggregated event. This count can increase as the matching events increase, until the correlation time window lapses. If an aggregated event closes within the time window, a new aggregated event is generated for matching events and displayed on the Events page. If you close the primary event either through the close event operation or from the event policy, the secondary events close after a delay of 10 minutes. Similarly, if you close secondary events either through the close event operation or from the event policy, the primary event closes after a delay of 10 minutes. If you delete a related (secondary) event, the event count in the aggregated (primary) event does not change. The system processes aggregated events as new events by using the event processing phases.
If you delete aggregated (primary) events that are closed, you can view related (secondary) events that are closed by searching them as they are not directly visible on the Events page.
The aggregated events are also displayed as situations on the console. Also, the event status is synced into . For more information, see Monitoring and investigating situations.