Detecting anomalies from logs

Related Topics

Generating alerts from logs

Anomalies are rare patterns or abnormalities that indicate a deviation from the normal behavior of system performance. BMC Helix Log Analytics provides automated analysis with machine learning (ML)-based anomaly detection of abnormal or rare log patterns. You can analyze anomalous logs to debug application errors and ensure optimum performance. You can proactively find concerns or errors before they become a problem.

Scenario:
Sarah is an administrator in Apex Global, which uses BMC Helix Log Analytics for log collection and analysis. Sarah wants to be alerted if there are any deviations from the normal behavior in the system. She knows that small deviations often lead to bigger failures, and she wants to debug these deviations before they become problems. How can Sarah receive such alerts?
Sarah configures alert policies to receive anomaly notifications. When an anomaly is detected, notifications are generated in the form of events. These events are generated in BMC Helix Operations Management. Sarah can also view these events in BMC Helix AIOps and BMC Helix Dashboards.

The following video (2:10) provides a high-level summary of the anomaly detection feature in BMC Helix Log Analytics.

https://www.youtube.com/watch?v=jG-Owi1tusc

Process overview

The following procedure is used by BMC Helix Log Analytics to detect anomalous logs:

Anomaly detection_updated.png

Processing the ingested logs.
An administrator configures alert policies to identify anomalies in the logs. These logs are processed to remove dates, special characters, and unnecessary keywords. The cleaning process uses regular expressions to remove dates and filter characters.
Detecting anomalies.
- Abnormal log records
- Based on the log volume
BMC Helix Log Analytics can detect anomalies in the following ways:
Analyzing anomalous logs.
Use the Explorer tab to analyze log anomalies.
Because of the anomaly detection alert policies, log anomaly events are generated in BMC Helix Operations Management. Use the Events page in BMC Helix Operations Management to analyze the log anomaly events.
For more information, see Analyzing anomalous logs and anomaly events.

Anomaly detection based on abnormal log patterns

BMC Helix Log Analyticsuses hierarchical clustering and cross-similarity techniques to detect anomalies in logs. Anomalies are detected by using the following method:

The hierarchical clustering identifies anomalies from the first batch of logs.
BMC Helix Log Analytics starts detecting anomalies after a default initial training period of 15 minutes. The training period is used to determine false positives.
Important Logs are ingested as a single file for any policy. If multiple log files match the selection criteria defined in the policy, the logs from all files are combined in a single file.
The identified anomalies are stored in a tags table in the binary format for further analysis.
For every subsequent batch, the system performs a cross-similarity check between newly identified anomalies and the data stored in the tags binary.
The default anomaly tag limit is 2000. That means when the tag count reaches the maximum limit, 20% of the oldest tags are removed.
The recommended range for the anomaly tag limit is between 1000 and 2000. If the limit exceeds this range, it can lead to issues in generating log anomalies.
The anomalies use the context extraction technique for quick insights, where the top three keywords are extracted from the anomalies.
Any newly identified anomalies that differ from the previous data are considered new anomalies.
The default aging period of an anomaly is two hours. That means if an anomaly is detected and reappears within two hours, the behavior is not considered anomalous. However, if an anomaly is detected and reappears after two hours, the behavior is flagged as anomalous.
The complete list of anomalies includes both new and existing anomalies in tags
Important Logs that contain the INFO, DEBUG, and TRACE keywords are not considered for anomaly detection. The keywords are not case-sensitive.

Anomalous log record examples

This section provides examples of anomalous logs.

	Anomalous logs	Non-anomalous logs
Apache	[Fri Mar 24 01:04:31 2023] [error] [client 218.62.18.218] Directory index forbidden by rule: /var/www/html/ [Fri Mar 24 20:47:17 2023] [error] jk2_init() Can't find child 2087 in scoreboard	[Fri Mar 24 04:47:44 2023] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties [Fri Mar 24 04:51:08 2023] [notice] jk2_init() Found child 6725 in scoreboard slot 10
Linux	Mar 24 06:06:29 combo kernel: audit(1138278101.749:164014): avc: denied { ioctl } for pid=594 exe=/usr/lib/vte/gnome-pty-helper path=/dev/pts/0 dev= ino=2 scontext=system_u:system_r:kernel_t tcontext=system_u:object_r:devpts_t tclass=chr_file Mar 24 06:06:29 combo kernel: audit(1138278101.766:164029): avc: denied { setattr } for pid=594 exe=/usr/lib/vte/gnome-pty-helper name=0 dev= ino=2 scontext=system_u:system_r:kernel_t tcontext=system_u:object_r:devpts_t tclass=chr_file	Mar 24 06:06:21 combo kernel: usbcore: registered new driver hub Mar 24 06:06:23 combo kernel: ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Mar 24 06:06:28 combo kernel: PCI: Found IRQ 11 for device 0000:00:1f.2 Mar 24 06:06:29 combo kernel: uhci_hcd 0000:00:1f.2: new USB bus registered, assigned bus number 1
Windows	2022-07-28 04:30:31, Warn CBS No startup processing required, TrustedInstaller service was not set as autostart, or else a reboot is still pending. 2022-07-29 00:00:47, Error CBS Failed to create backup log cab. [HRESULT = 0x80070001 - ERROR_INVALID_FUNCTION]	2022-07-28 04:30:31, Info CBS SQM: Initializing online with Windows opt-in: False 2022-07-28 04:30:31, Info CBS SQM: Cleaning up report files older than 10 days. 2022-07-28 04:30:31, Info CBS SQM: Requesting upload of all unsent reports. 2022-07-29 00:00:46, Info CBS Startup processing thread terminated normally

Configuring anomaly detection

As an administrator, create alert policies in BMC Helix Log Analytics to receive anomaly notifications. After you enable an alert policy, you can use the Explorer tab to analyze anomalous logs.

Best practice
You can configure a single anomaly detection alert policy for logs that belong to the same service or application. However, we recommend that you configure separate alert policies for logs that belong to different services or applications.

To create an alert policy for anomaly detection

In BMC Helix Log Analytics, navigate to the Alerts > Alert Policies page and click Create.
Add the policy information by performing the following steps:
1. Enter a unique name and description for the alert.
2. In the Precedence field, set a precedence for the policy.
  The precedence number defines the priority for executing the policy. A policy with a lower precedence number is executed first.
In the Policy Selection Criteria section, perform the following steps:
1. Configure the condition for which the event will be generated.
  For example, enter status Equals 401 AND filename EQUALS BMC_Apache_SantaClara.log. When you click in the box, you are prompted to make a selection. Each time you make a selection, you are progressively prompted to make another selection.
  The selection criteria consist of an opening parenthesis, followed by the slot name, the operator, the slot value (which can be a string based on the type of slot selected), and the closing parenthesis. You can optionally select the logical operator AND or OR to add additional conditions. Specifying the opening and closing parentheses is optional.
  Important
  - The values that you enter for a field in the selection criteria are case-sensitive. For example, if the host name is WebServer.example.com, add the selection criteria as (host_name Equals WebServer.example.com). If you enter (host_name Equals webserver.example.com), the event is not generated.
  - Ideally, each policy should be for a single data source.
2. To group occurrences of a condition, perform one of the following actions:
  - In the Group by field, enter the values by which you want to group occurrences of a condition.
    For example, to group all occurrences of status 401 on a particular host name, enter the host name. You can enter a maximum of three values, but one must be the host name.
  - Click in the Group by field and select an appropriate option.
    The default value is log_source_host .
3. Select the Anomaly Detection button and perform the following steps:
  1. From the Log Attribute list, select the field that contains the log message.
  2. Select the type of event that you want to create.
    If it is not Message or Log, select Custom and in the Log Attribute Value field, enter the field that contains the log message.
4. In the Alert Parameters section, complete the following steps:
  1. To add host name to the event, in the Alert Parameters section, perform one of the following actions:
    This value helps you correlate events in BMC Helix AIOps.
    - In the Hostname field, enter a host name.
    - Click in the Hostname field and select the appropriate option.
      The default value is log_source_host
  2. In the Message field, change the default message, if required.
  3. To use a log field value in the message, put double curly brackets around the field name such as {{ $.location }} .
  4. In Additional Details, configure additional event parameters such as source identifier.
    These values are set for the generated event.
For data-level access control, select one or more user groups from the User Group list.
Important
Make sure that you select the same user group that you selected in the collection policy.
With this setting, the system generates alerts, and only the selected user group can access them.
Alerts are not generated if you select different user groups in the collection and alert policies.
Enable and save the policy by performing the following steps:
1. Select Enable Policy.
  You can choose to enable the collection policy later.
2. Click Save.
  View all your policies on the Alert Policies page.

You can copy an alert policy and duplicate an existing policy with all its parameters and reuse it as needed. The copy option helps you save time and create consistent alert policies across the IT environment.

For instructions about editing an alert policy, see Managing-alert-policies.

Configuring anomaly detection settings

BMC Helix Log Analytics provides anomaly detection settings to help you accurately identify unusual patterns and potential issues in your log data. Use these settings to configure the following parameters that determine how anomalies are detected.

With these settings, you can configure the parameters that are described in the following table:

Parameter	Description
Training period	A defined duration during which the system analyzes log data to establish the system behavior. During the training period, BMC Helix Log Analytics collects and processes log data to understand typical patterns and frequencies within the log data. You can enhance the accuracy and reliability of anomaly detection by setting an appropriate training period. The default training period is 15 minutes.
Aging period	A specific duration during which new anomaly events are not created for a recurring issue. The aging period helps to prevent the generation of redundant alerts for repeated anomalies. By configuring an appropriate aging period, you reduce the number of alerts, helping you focus on addressing unique and critical issues and improving the efficiency of your log analysis and response processes. The default aging period is 2 hours.
Anomaly tag limit	The maximum number of anomaly tags that can be stored within the system. After this limit is reached, older anomaly tags are pruned by 20% to make space for new tags. You can maintain effective anomaly detection by managing the anomaly tag limit effectively. With this parameter, only the most relevant and recent anomalies are prioritized. The default anomaly tag limit is 5000.
Volume Anomaly	A configuration that detects unusual spikes or drops in log volume and generates an anomaly. Set this parameter based on your monitoring needs. Volume Anomaly configuration setting enables or disables volume‑based anomaly detection. When enabled, the system analyzes log volume for unusual spikes or drops and generates an anomaly. Disabling this option skips the volume analysis to reduce processing overhead. By setting this parameter appropriately, you can optimize anomaly detection performance while ensuring that significant deviations in log volume are captured when relevant. By default, the Volume Anomaly is enabled.
Anomaly Severity	A threshold that controls which anomaly events are forwarded to the alerting pipeline for display or notification. BMC Helix Log Analytics classifies anomalies as low, medium, or high severity, and the selected threshold determines which of these are allowed to pass through: Low: Forwards low, medium, and high severity anomalies Medium: Forwards medium and high severity anomalies High: Forwards only high-severity anomalies By selecting an appropriate severity level, you can reduce alert noise and focus on events that require operational attention. The default Anomaly Severity is Low.

To configure anomaly detection settings, perform the following steps as an administrator:

From the Configurations menu, select Anomaly Detection Settings.
On the Anomaly Detection Settings page, modify Training Period, Aging Period, and Anomaly Tag Limit according to your requirements.
(Optional) Enable Volume Anomaly and select Anomaly Severity from the drop-down list.
Click Save.

To set the above values to their default values, click Reset to Default.

Analyzing anomalous logs and anomaly events

Use BMC Helix Log Analytics to analyze anomalous logs. Use BMC Helix Operations Management to analyze the related anomaly events.

For more information, see Analyzing anomalous logs and anomaly events.

Visualizing log anomaly events in BMC Helix Dashboards

Use the Self Monitoring dashboard in BMC Helix Dashboards to visualize anomalous log data.

The following image displays the Self Monitoring dashboard:

Use the Search parameter column to open the Explorer tab in BMC Helix Log Analytics.

Use the Event Details column to open the Event Details page in BMC Helix Operations Management.

For more information about the Self Monitoring dashboard, see Self-monitoring dashboard in the BMC Helix Dashboards documentation

Where to go from here

For information about creating a static alert policy, see Managing static alerts.

For information about log events, see Log events.