Detecting anomalies

Detecting anomalies can help you find unusual patterns in the data that you are monitoring. Anomalies not only tell you if there is a change in the usual pattern. It also tells you how much the new pattern deviates from the usual pattern both visually and numerically. Additionally, it shows you infrequent occurrences of events.

This information can help you gain important insights that in turn help you uncover new problems and detect the root cause of a problem more quickly.

To get an overview and understand the use cases for viewing anomalies, see Getting-started-with-anomaly-detection.

This topic provides the following information about viewing and understanding anomalies:

Enabling anomaly detection

By default, anomaly detection is disabled. You need to manually enable a setting to start anomaly detection. You need to enable anomaly detection only in the Console Server.

Note

Anomaly detection requires additional resources (in terms of CPU and maximum Java heap size). Therefore, before you enable anomaly detection in a production environment, BMC recommends you to contact BMC Support to help you tune your setup and get optimal product performance.

To enable or disable anomaly detection

To enable or disable anomaly detection from the Console Server follow these steps:

Navigate to the olaengineCustomConfig.properties file and edit it.
For more information, see Modifying-the-configuration-files.
If not already present, add the property, baseline.enabled.
Proceed in one of the following ways:
- To enable anomaly detection, change the property value to true.
- To disable anomaly detection, ensure that the property value is set to false (default value).
For example, baseline.enabled=true.
Restart the Console Server.
For more information see Starting-or-stopping-product-services.

Viewing anomalies

Anomalies can be viewed in context of a search. To view anomalies, you do not require any specific domain knowledge about the data being monitored. For example, searching for a specific error or warning message is not required to view anomalies in a given time period. Typically, just a time range and optionally a tag value is sufficient to isolate anomalies. This means when you perform a search, you can select a time range and a tag value and based on this search context, you can see anomalies. For more information about tags, see Understanding-tags.

Anomaly detection takes into consideration the hosts available under the Filters panel (during the search). When you view anomalies, the host information is identified and displayed next to the anomalous events to enable you to get better insights.

To view anomalies

Ensure that anomaly detection is already enabled.
Navigate to the Search tab and perform a search.
On the top-left of the page, click the vertical three dots (indicating a menu) next to All Data and select Analyze Data.
The Coalesce page is displayed.
From the Coalesce page, turn on the Anomalies setting to view anomalies.
This results in a summary of anomalous events categorized as out-of-range events and rare events.

Understanding how anomalies are detected

If you try to view anomalies for any new type of data collected, all the events (coming from that data) might be indicated as infrequent occurrences (or rare events). For example, if you try to view anomalies after enabling anomaly detection for the first time or after adding a new type of data collector, it is possible that initially all the events are indicated as rare events. This is because the product is still forming the common pattern based on which anomalies are detected. Over time, the pattern is adjusted and slowly you can expect to see more accurate results.

By default, the anomaly job is run every hour. This means if you run a search for a period just after the anomaly job was last run, it is possible that you do not see current results. In such a scenario, a message indicating the same is displayed on the Anomaly page.

Anomalies are detected based on the common patterns learnt. To learn common patterns, IT Data Analytics isolates a window of seven days before search time. This means when you search, anomalies are detected based on the common pattern formed in the last seven days before the search time context. For example, if you search for the last 24 hours, anomalies are detected based on the last seven days before the last 24 hours search time context.

The common patterns learnt are preserved in the system, until the expiry of the maximum data retention period (available under Administration > System Settings) and for an additional seven days. For example, if your maximum data retention period is set to 14 days, then the common patterns learnt are preserved in the system for 14 + 7 equals 21 days.

Based on the number of times the common pattern occurs, a normal range is established by the product. Any event falling out of this range is indicated as out-of-range events and any event that cannot be included in the range (because that event was never seen before or seen infrequently) are indicated as rare events.

Out-of-range events

Out-of-range events occur when the number (or count) of events matching a specific pattern type (that is the common pattern) is outside the normal range for a given period.

For example, suppose 20-30 events matching pattern X usually occur. But in the last one hour, 200 events of pattern X occurred. Such an instance would be considered as an out-of-range event. The current pattern of events (200) deviates from the common pattern (20-30) on a higher side. Therefore, this deviation is visually plotted on the bar (representing the deviation factor) under the Deviation column, on the right side of the splitting line.

Example

Suppose you are monitoring logs that provide information about daily logins and user activity for an application (with anomaly detection enabled). Suppose the normal range for the common pattern identified for this kind of information is 40. This means around 40 users log on to the application on a daily basis.

Scenario 1: If on a particular day, there are only 10 users that have logged on to the application, this will be indicated as an anomaly on the Out Of Range tab. This is because the count of 10 deviates from the normal range of 40 (on a lower side). By looking at the Out Of Range tab, you can infer that perhaps some users are facing issues in logging on to the application or it is possible that many people working on the application are on leave.

Scenario 2: If on a particular day, there are 1000 attempts to log on to the application, this will be indicated as an anomaly on the Out Of Range tab. This is because the count of 1000 largely deviates from the normal range of 40 (on a higher side). By looking at the Out Of Range tab, you can infer that perhaps an attacker is trying to attempt a brute force attack to cause denial of service for a large number of user accounts.

The following table describes the information available under the Out Of Range tab:

Rare events

Rare events occur when the number (or count) of events matching a specific pattern type (that is the common pattern) has occurred less than five times in a given time period. Rare events are those that are considerably different as compared to rest of the data.

Example

Suppose you are monitoring logs that provide information about the features that are most used and least used in an application (with anomaly detection enabled).

In this scenario, you can look at the Rare tab to find features that were least used or never used before.

The following table describes the information available for the rare events:

Changing the matching factor for viewing anomalies

Anomalies are identified based on the coalesced search results. By default, you can view anomalies based on a matching factor of 70%. This means anomalous records are grouped based on at least 70% similarity (in other words, maximum 30% variation), which is same as the coalescence factor. The common pattern (or baseline) used for detecting anomalies also matches incoming data based on a 70% similarity (same as the coalescence factor).

Therefore, changing the coalescence factor can not only affect the coalesced results, but also what you see as anomalies. Changing the coalescence factor results in the matching factor set at the same level. For example, if the coalescence factor is set at 80% similarity, the matching factor (for anomalies) is automatically set at 80%. You can further change the matching factor and override the automatic setting. However, to be able to view more accurate results, it is recommended that you keep the coalescence factor and the matching factor at the same level.

The matching factor works in the same way as the coalescence factor. As you move the slider from left to right, the variation used to group anomalous records becomes lesser (as shown in the following figure). In other words, the anomalous records (which are coalesced records) contain individual records that are more similar. Therefore, when you move the slider towards the right, you can see more results as compared to moving the slider towards the left. For more information about the coalescence factor, see Viewing-coalesced-results.

matching factor.png