Detecting anomalies from logs


Anomalies are rare patterns or abnormalities that indicate a deviation from the normal behavior. BMC Helix Log Analytics provides automated analysis with machine learning (ML)-based anomaly detection of abnormal or rare log patterns. You can analyze anomalous logs to debug application errors and ensure optimum performance. You can proactively find concerns or errors before they become a problem.

Example

Sarah is an administrator in Apex Global, which uses BMC Helix Log Analytics for log collection and analysis. Sarah wants to be alerted if there are any deviations from the normal behavior in the system. She knows that small deviations often lead to bigger failures, and she wants to debug these deviations before they become problems. How can Sarah receive such alerts?

Sarah configures alert policies to receive anomaly notifications. When an anomaly is detected, notifications are generated in the form of events. These events are generated in BMC Helix Operations Management. Sarah can also view these events in BMC Helix AIOps and BMC Helix Dashboards.

The following video (2:10) provides a high-level summary of the anomaly detection feature in BMC Helix Log Analytics.

 

icon-play@2x.pnghttps://youtu.be/jG-Owi1tusc

The following procedure is used by BMC Helix Log Analytics to detect anomalous logs:

Anomaly detection_updated.png

  1. Processing the ingested logs.
    An administrator configures alert policies to identify anomalies in the logs. These logs are processed to remove dates, special characters, and unnecessary keywords. The cleaning process uses regular expressions to remove dates and filter characters.
  2. Detecting anomalies.
    BMC Helix Log Analytics uses hierarchical clustering and cross-similarity techniques to detect anomalies in logs. Anomalies are detected by using the following method:
    1. The hierarchical clustering identifies anomalies from the first batch of logs.
      BMC Helix Log Analytics starts detecting anomalies after a default initial training period of 15 minutes. The training period is used to determine false positives.

      Important

      Logs are ingested as a single file for any policy. If there are multiple log files that match the selection criteria defined in a policy, logs from all files are combined in a single file at the time of ingestion.
      For better system performance and accuracy in anomaly detection, create a policy to ingest logs from a single source.

       

    2. The identified anomalies are stored in a tags table in the binary format for further analysis.
    3. For every subsequent batch, the system performs a cross-similarity check between newly identified anomalies and the data stored in the tags binary.
    1. Any newly identified anomalies that differ from the previous data are considered new anomalies.
      The default aging period of an anomaly is two hours. That means, if an anomaly is detected, and the anomaly reappears within two hours, the behavior is not considered anomalous. However, if an anomaly is detected, and the anomaly reappears after two hours, the behavior is flagged as anomalous.
      The complete list of anomalies includes both new and existing anomalies in tags.
    2. The default anomaly tag limit is 2000. That means when the tag count reaches the maximum limit, 20% of the oldest tags are removed.
      The recommended range for the anomaly tag limit is between 1000 and 2000. If the limit exceeds this range, it can lead to issues in generating log anomalies.
    1. The anomalies use the context extraction technique for quick insights, where the top three keywords are extracted from the anomalies.

      Important

      Logs that contain the INFO, DEBUG, and TRACE keywords are not considered for anomaly detection. The keywords are not case-sensitive.

  1. Analyzing anomalous logs.
    Use the Explorer tab to analyze log anomalies.
    Because of the anomaly detection alert policies, log anomaly events are generated in 
    BMC Helix Operations Management. Use the Events page in BMC Helix Operations Management to analyze the log anomaly events.
    For more information, see Analyzing anomalous logs and anomaly events.

Anamolous log record examples

Example of anomalous Apache logs

Non-anomalous logs

[Fri Mar 24 04:47:44 2023] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties

[Fri Mar 24 04:51:08 2023] [notice] jk2_init() Found child 6725 in scoreboard slot 10

Anomalous logs

[Fri Mar 24 01:04:31 2023] [error] [client 218.62.18.218] Directory index forbidden by rule: /var/www/html/

[Fri Mar 24 20:47:17 2023] [error] jk2_init() Can't find child 2087 in scoreboard
Example of anomalous Linux logs

Non-anomalous logs

Mar 24 06:06:21 combo kernel: usbcore: registered new driver hub

Mar 24 06:06:23 combo kernel: ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx

Mar 24 06:06:28 combo kernel: PCI: Found IRQ 11 for device 0000:00:1f.2

Mar 24 06:06:29 combo kernel: uhci_hcd 0000:00:1f.2: new USB bus registered, assigned bus number 1

Anomalous Records

Mar 24 06:06:29 combo kernel: audit(1138278101.749:164014): avcdenied  { ioctl } for  pid=594 exe=/usr/lib/vte/gnome-pty-helper path=/dev/pts/0 dev= ino=2 scontext=system_u:system_r:kernel_t tcontext=system_u:object_r:devpts_t tclass=chr_file

Mar 24 06:06:29 combo kernel: audit(1138278101.766:164029): avcdenied  { setattr } for  pid=594 exe=/usr/lib/vte/gnome-pty-helper name=0 dev= ino=2 scontext=system_u:system_r:kernel_t tcontext=system_u:object_r:devpts_t tclass=chr_file
Example of anomalous Windows logs

Non-anomalous logs

2022-07-28 04:30:31, Info  CBS  SQM: Initializing online with Windows opt-in: False

2022-07-28 04:30:31, Info  CBS  SQM: Cleaning up report files older than 10 days.

2022-07-28 04:30:31, Info  CBS  SQM: Requesting upload of all unsent reports.

2022-07-29 00:00:46, Info  CBS  Startup processing thread terminated normally

Anomalous logs

2022-07-28 04:30:31, Warn CBS No startup processing required, TrustedInstaller service was not set as autostart, or else a reboot is still pending.

2022-07-29 00:00:47, Error CBS Failed to create backup log cab. [HRESULT = 0x80070001 - ERROR_INVALID_FUNCTION]

 

Configuring anomaly detection

As an administrator, create alert policies in BMC Helix Log Analytics to receive anomaly notifications. After you enable an alert policy, you can use the Explorer tab to analyze anomalous logs.

Best practice
You can configure a single anomaly detection alert policy for logs that belong to the same service or application. However, we recommend that you configure separate alert policies for logs that belong to different services or applications.

For instructions about creating an alert policy, see Generating-alerts-from-logs.

BMC Helix Log Analytics provides anomaly detection settings to help you accurately identify unusual patterns and potential issues in your log data. Use these settings to configure the following parameters that determine how anomalies are detected.

With these settings, you can configure the parameters that are described in the following table:

To configure anomaly detection settings

As an administrator:

  1. From the Configurations menu, select Anomaly Detection Settings.
  2. On the Anomaly Detection Settings page, modify Training Period, Aging Period, and Anomaly Tag Limit according to your requirements.
  3. Click Save.
    To set these values to the default, click Reset to Default.

    image-2025-1-31_16-42-42.png

 

Analyzing anomalous logs and anomaly events

Use BMC Helix Log Analytics to analyze  anomalous logs. Use BMC Helix Operations Management to analyze the related anomaly events.

To analyze anomalous logs

  1. In BMC Helix Log Analytics, on the Explorer tab, select the logml-* index pattern to view all the anomalous log messages.
  2. Analyze the anomaly score of the logs to understand the anomaly strength.
    Each anomalous record contains the Anomaly and Anomaly_Score fields. The value of the Anomaly field is set to 1.0. The Anomaly_Score field represents the anomaly strength and has a value between 0 and 1. If the score is higher, the anomaly strength of the record is high.
    image-2024-6-7_18-36-6.png

BMC Helix Log Analytics automatically assigns severity to anomalous log messages depending on the keywords that the message contains.For example, if a message contains the words error or critical, the anomaly is assigned the High severity.The following table displays the severity level and its associated keywords:

Severity

Associated keywords

High

error, critical, crash, failure, fatal, exception, outage, deadlock, segfault, corrupt, unreachable

Medium

warn, warning, timeout, degraded, inconsistency, stale, inefficiency, bottleneck, throttle

Low

All anomalous log messages that do not contain the keywords listed for High and Medium severities are granted Low severity.

Severity

Associated keywords

High

error, critical, crash, failure, fatal, exception, outage, deadlock, segfault, corrupt, unreachable

Medium

warn, warning, timeout, degraded, inconsistency, stale, inefficiency, bottleneck, throttle

Low

All anomalous log messages that do not contain the keywords listed for High and Medium severities are granted Low severity.

Severity

Associated keywords

High

error, critical, crash, failure, fatal, exception, outage, deadlock, segfault, corrupt, unreachable

Medium

warn, warning, timeout, degraded, inconsistency, stale, inefficiency, bottleneck, throttle

Low

All anomalous log messages that do not contain the keywords listed for High and Medium severities are granted Low severity.

Severity

Associated keywords

High

error, critical, crash, failure, fatal, exception, outage, deadlock, segfault, corrupt, unreachable

Medium

warn, warning, timeout, degraded, inconsistency, stale, inefficiency, bottleneck, throttle

Low

All anomalous log messages that do not contain the keywords listed for High and Medium severities are granted Low severity.

To analyze log anomaly events

In BMC Helix Operations Management, use the Events page to analyze log anomaly events. Click the event to view the event details and analyze it.

The following procedure explains how you can go to BMC Helix Log Analytics from BMC Helix Operations Management.

  1. On the Events page in BMC Helix Operations Management, click an anomaly event to view the event details.
    Log anomaly events are generated with the Log Event class. You can hover over Class Event class icon.png to see the class of the event.

    Tip: Filter log anomaly events to quickly analyze them

    To filter all events of the Log Event class, use advanced filter in BMC Helix Operations Management. For more information about advanced filters, see Filtering events.

  2. Click the Others tab.
  3. In the Search Parameters field, click the Review Logs link to open the Explorer tab in BMC Helix Log Analytics.

search_para_link.png

The Explorer tab opens in BMC Helix Log Analytics and the logs that generated the event are displayed. The anomalous logs are shown in the index pattern that begins with logml.
The anomalous log events are further processed for creating situations. For more information about situations, see Monitoring and investigating situations in the BMC Helix AIOps documentation.

 

Visualizing log anomaly events in BMC Helix Dashboards

Use the Self Monitoring dashboard in BMC Helix Dashboards to visualize anomalous log data.

The following image displays the Self Monitoring dashboard:

self_monitoring_dashboard_emphasised.png

Use the Search parameter column to open the Explorer tab in BMC Helix Log Analytics.

Use the Event Details column to open the Event Details page in BMC Helix Operations Management.

For more information about the Self Monitoring dashboard, see Self-monitoring dashboard in the BMC Helix Dashboards documentation.

 

 

 

 

 

 

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*