Generating alerts from logs
Types of alert policies
You can create the following types of alert policies:
Static Thresholds: When you are aware of the conditions for which you want to be alerted and you also know where these conditions will occur, use static thresholds. For example, while analyzing logs, you come across a status 401 (authentication failure) for which you want to be notified. Let's say you notice that the status is reported multiple times in a short time period. You want to be notified if it occurs again. So, you create alert policies that generate events when the conditions configured in the policies occur in the logs. Here are a few more examples:
Examples
- An exception in the applicationserver log
- Error log level in the database log
- Unexpected token in the application log
Anomaly Detection: Logs contain anomalies that represent potential system faults, which makes the logs critical to debugging application performance and errors. BMC Helix Log Analytics provides automated analysis with machine learning (ML)-based anomaly detection of abnormal or rare log patterns (or anomalies) that indicate any deviation from the normal behavior. This analysis helps you find concerns proactively before they become a problem and help troubleshoot errors when they arise.
When you want to be alerted if an anomalous log record is generated in a certain type of log like database logs. For example, you want to be alerted if an anomalous log message is generated in the Kubernetes microservice logs. Here are a few more examples:Examples
- An anomaly in a specific service in a Kubernetes environment
- An anomaly in a specific service of Amazon Web Services
- An anomaly in Windows event for a particular host or VM
Alert policy details
You can create an alert policy with static thresholds. When you are aware of the conditions for which you want to be alerted and you also know where these conditions will occur, use static thresholds. For example, w hile analyzing logs, you come across a status 401 (authentication failure) for which you want to be notified. Let's say you notice that the status is reported multiple times in a short time period. You want to be notified if it occurs again. So, you create alert policies that generate events when the conditions configured in the policies occur in the logs. Here are a few more examples:
Examples
- An exception in the applicationserver log
- Error log level in the database log
- Unexpected token in the application log
An alert policy consists of the following details:
- Name, description, and precedence.
Policy selection criteria or the conditions that generate an event. Configure the policy selection criteria based on the fields available in the logs. The operators that you can use are Equals, Not Equal to, and Contains. Combine these conditions with the AND and OR logical operators. Optionally, group these conditions on a particular field, such as when status Equals 401 for a particular host. In this case, you group the condition on the host field. Next, define the time period for these conditions to be true. As an example, generate an event if the status Equals 401 for 5 times (minimum) in the past 10 minutes.
- Host name, which can be either a static value that you type or a field in the logs that you select. If you select a log field, ensure that you select the same log field in the Group by field.
Additional Details are the values from the logs that are added to the fields of the generated event. These values can be either static values that you type or a field in the logs that you select. The additional details that you can add to the event are described as slots on this page: Log Alert event class. Fields of type Enum accept only preconfigured values. If you enter a value that is not preconfigured, the default value is added to the slot in the event.
To add custom fields to an event, see Event management endpoints in the REST API..
To create an alert policy
The following video (2:34) illustrates the steps to create an alert policy for static thresholds.
In BMC Helix Log Analytics, use the Alerts > Alert Policies > Create button and perform the following steps:
- Add the policy information by performing the following steps:
- Enter a unique name and description for the alert.
- In the Precedence field, set a precedence for the policy.
The precedence number defines the priority for executing the policy. A policy with a lower precedence number is executed first.
- In the Policy Selection Criteria section, perform the following steps:
Configure the condition for which the event will be generated.
For example, enter status Equals 401 AND filename EQUALS BMC_Apache_SantaClara.log. When you click in the box , you are prompted to make a selection. Each time you make a selection, you are progressively prompted to make another selection.
The selection criteria consist of an opening parenthesis, followed by the slot name, the operator, the slot value (which can be a string based on the type of slot selected), and the closing parenthesis. You can optionally select the logical operator AND or OR to add additional conditions. Specifying the opening and closing parentheses is optional.- To group occurrences of a condition, perform one of the following actions:
- In the Group by field, enter the values by which you want to group occurrences of a condition.
For example, to group all occurrences of status 401 on a particular host name, enter the host name. You can enter a maximum of three values, but one must be the host name. - Click in the Group by field and select an appropriate option.
The default value is log_source_host .
- In the Group by field, enter the values by which you want to group occurrences of a condition.
- Perform one of the following steps:
- If you want to create an alert for a static threshold, select the Static Thresholds button and perform the following steps::
- In the Alert Condition field, decide how many times the condition must occur in a time period to generate the event
- Enter the status of the event.
- Enter and select the values in the Minutes, Minimum count is fields.
For example, when status 401 is reported a minimum of 50 times within a 5-minute period, a critical event is generated.
- If you want detect anomalies from logs, select the Anomaly Detection button and perform the following steps:
- From the Log Attribute list, select the field that contains the log message.
- Select the type of event that you want to create.
If it is not Message or Log, select Custom and in the Log Attribute Value field, enter the field that contains the log message.
For more information about anomaly detection, see Detecting-anomalies-from-logs.
- In the Alert Parameters section, complete the following steps:
- To add host name to the event, in the Alert Parameters section, perform one of the following actions:
This value helps you correlate events in BMC Helix AIOps.- In the Hostname field, enter a host name.
- Click in the Hostname field and select the appropriate option.
The default value is log_source_host .
- In the Message field, change the default message, if required.
To use a log field value in the message, put double curly brackets around the field name such as {{ $.location }} . - In Additional Details, configure additional event parameters such as source identifier.
These values are set for the generated event.
- To add host name to the event, in the Alert Parameters section, perform one of the following actions:
For data-level access control, select one or more user groups from the User Group list.
With this setting, the system generates alerts, and only the selected user group can access alerts.
- Enable and save the policy by performing the following steps:
- Select Enable Policy.
You can choose to enable the collection policy later. - Click Save.
View all your policies on the Alert Policies page.
- Select Enable Policy.
To edit an alert policy
- In BMC Helix Log Analytics, navigate to Alerts > Alert Policies.
- Click the Action menu of the policy that you want to edit.
Make your changes and save the policy.
To understand the number of events generated
Let's consider the following examples to understand how many events are generated for an alert policy for static thresholds. For an alert policy to detect anomalies, one event is generated. For one minute, no change is made to the event whether or not more anomalies are reported for the alert policy. However, after a minute is over and an anomaly is identified for the same alert policy, the Repeated count value of the same event is updated. This value is updated only one time in a minute.
Configurations in an alert policy | Incoming logs | Number of events generated | Details |
---|---|---|---|
Policy selection criteria: status Equals 401 Group by: blank Hostname: blank or static value For last: 5 minutes; When minimum count is: 10 | The condition is satisfied in the last 5 minutes. | 1 | The event is generated after the criteria is satisfied in the last five minutes with a repeat count of zero. When it is satisfied again in the next five minutes, for the same event, the Repeated count field is updated as 1. |
Policy selection criteria: status Equals 401 Group by: hostname Hostname: $.hostname For last: 5 minutes; When minimum count is: 10 | The condition is satisfied for host 1 and host 2 in the last 5 minutes. | 2 |
|
Policy selection criteria: status Equals 401 Group by: blank Hostname: host 1 (static value) For last: 5 minutes; When minimum count is: 10 | The condition is satisfied for host 1 in the last 5 minutes. | 1 | The event is generated for host 1 because the criteria is satisfied in the last five minutes with a repeat count of zero. |
Policy selection criteria: status Equals 401 Group by: city Hostname: host 1 (static value) For last: 5 minutes; When minimum count is: 10 | The condition is satisfied for host 1 and city 1 and for host 1 and city 2 in the last 5 minutes. | 2 |
|
Policy selection criteria: status Equals 401 Group by: hostname, city, and country Hostname: $.hostname For last: 5 minutes; When minimum count is: 10 | The condition is satisfied for host 1, city 1, and country 1 and host 2, city 2, and country 2 in the last 5 minutes. | 2 |
|
Policy selection criteria: status Equals 401 Group by: log_source_host Hostname: $.log_source_host User Group: Administrator For last: 5 minutes; When minimum count is: 10 | The condition is satisfied in the last 5 minutes. The log data must meet the selection criteria and must be accessible to the associated user group. | 1 | The event is generated after the criteria is satisfied in the last five minutes with a repeat count of zero. When it is satisfied again in the next five minutes, for the same event, the Repeated count field is updated as 1. |
To view the generated events
- Click the Alerts menu.
Select Events.
The Events page in BMC Helix Operations Management is displayed. The class of these events is Log Event. Filter the events by the Log Event class to view events generated by using alert policies. For more information about events, see Monitoring and managing events.To view these events in BMC Helix Dashboards, navigate to Dashboards > Manage Dashboards > Log Analytics, and click the Self Monitoring dashboard.
After an alert policy is executed, events are generated in BMC Helix Operations Management based on the selection criteria and user groups in the policy.
To use events to analyze logs
When you configure alert policies and the condition configured in a policy is satisfied in the logs, events are generated in BMC Helix Operations Management. The class of these events is Log Event. To continuously track such events, use the Self monitoring dashboard available in the Log Analytics folder in BMC Helix Dashboards.
In the Search Parameters field of the event under Others, there is the link to launch BMC Helix Log Analytics. When you click this link, it opens the Explorer in BMC Helix Log Analytics to show associated logs. These logs are filtered based on the criteria mentioned in Policy Selection Criteria and the fields selected in the Group by field of the alert policy.
If the host name is present as a configuration item (CI) for a service in BMC Helix AIOps , you can monitor the generated events in BMC Helix AIOps. For a CI of a service or the host name, these events are correlated in BMC Helix AIOps.
To close the events automatically
The generated events are not closed automatically. Use event policies in BMC Helix Operations Management to close the events automatically. For example, create an event policy with time-based configurations to close the events that have not been modified in the last two hours. For more information, see Closing events automatically.
Learn more
Read more about automated log analysis with machine learning (ML)-based anomaly detection to process log contents and find abnormal entries and behavior patterns in logs Predictive Log Alerting with ML Anomaly Detection.