Detecting anomalies

Detecting anomalies can help you find unusual patterns in the data that you are monitoring. Anomalies not only tell you if there is a change in the usual pattern, but it also tells you how much the new pattern deviates from the usual pattern both visually and numerically. Additionally, it shows you infrequent occurrences of events.

This information can help you gain important insights that in turn help you uncover new problems and detect the root cause of a problem more quickly.

To get an overview and understand the use cases for viewing anomalies, see Getting-started-with-anomaly-detection.

This topic provides the following information about viewing and understanding anomalies:

Enabling or disabling anomaly detection

Anomaly detection is enabled by default. If you are upgrading from a previous version of TrueSight IT Data Analytics, whatever settings you had specified for Anomaly Detection will be carried forward to the next version. If you had disabled anomaly detection earlier and want to enable it, or vice-a-versa, you can use the configureanomaly CLI command. For more information about using the command line to enable or disable anomaly detection, see configureanomaly CLI command.

Note

Anomaly detection requires additional resources (in terms of CPU and maximum Java heap size). BMC recommends you to contact BMC Support to help you tune your setup and get optimal product performance.

Viewing anomalies

Anomalies can be viewed in context of a search. To view anomalies, you do not require any specific domain knowledge about the data being monitored. For example, searching for a specific error or warning message is not required to view anomalies in a given time period. Typically, just a time range and optionally a tag value is sufficient to isolate anomalies. This means when you perform a search, you can select a time range and a tag value and based on this search context, you can see anomalies. For more information about tags, see Understanding-tags.

Anomaly detection takes into consideration the hosts available under the Filters panel (during the search). When you view anomalies, the host information is identified and displayed next to the anomalous events to enable you to get better insights.

To view anomalies

Ensure that anomaly detection is already enabled.
Navigate to the Search tab and perform a search.
On the top-left of the page, click the vertical three dots (indicating a menu) next to All Data and select Analyze Data.
The Coalesce page is displayed.
From the Coalesce page, turn on the Anomalies setting to view anomalies.
This results in a summary of anomalous events categorized as out-of-range events and rare events.

Understanding how anomalies are detected

If you try to view anomalies for any new type of data collected, all the events (coming from that data) might be indicated as infrequent occurrences (or rare events). For example, if you try to view anomalies after enabling anomaly detection for the first time or after adding a new type of data collector, it is possible that initially all the events are indicated as rare events. This is because the product is still forming the common pattern based on which anomalies are detected. Over time, the pattern is adjusted and slowly you can expect to see more accurate results.

By default, the anomaly job is run every 15 minutes. This means if you run a search for a period just after the anomaly job was last run, it is possible that you do not see current results. In such a scenario, a message indicating the same is displayed on the Anomaly page.

Anomalies are detected based on the common patterns learnt. To learn common patterns, TrueSight IT Data Analytics isolates a window of seven days before search time. This means when you search, anomalies are detected based on the common pattern formed in the last seven days before the search time context. For example, if you search for the last 24 hours, anomalies are detected based on the last seven days before the last 24 hours search time context.

The common patterns learnt are preserved in the system, until the expiry of the maximum data retention period (available under Administration > System Settings) and for an additional seven days. For example, if your maximum data retention period is set to 14 days, then the common patterns learnt are preserved in the system for 14 + 7 equals 21 days.

Based on the number of times the common pattern occurs, a normal range is established by the product. Any event falling out of this range is indicated as Outlier events and any event that cannot be included in the range (because that event was never seen before or seen infrequently) are indicated as rare events.

Outlier events

Outlier events occur when the number (or count) of events matching a specific pattern type (that is the common pattern) is outside the normal range for a given period.

For example, suppose 20-30 events matching pattern X usually occur. But in the last one hour, 200 events of pattern X occurred. Such an instance would be considered as an outlier event. The current pattern of events (200) deviates from the common pattern (20-30) on a higher side. Therefore, this deviation is visually plotted on the bar (representing the deviation factor) under the Deviation column, on the right side of the splitting line.

Example

Suppose you are monitoring logs that provide information about daily logins and user activity for an application (with anomaly detection enabled). Suppose the normal range for the common pattern identified for this kind of information is 40. This means around 40 users log on to the application on a daily basis.

Scenario 1: If on a particular day, there are only 10 users that have logged on to the application, this will be indicated as an anomaly on the Outlier tab. This is because the count of 10 deviates from the normal range of 40 (on a lower side). By looking at the Outlier tab, you can infer that perhaps some users are facing issues in logging on to the application or it is possible that many people working on the application are on leave.

Scenario 2: If on a particular day, there are 1000 attempts to log on to the application, this will be indicated as an anomaly on the Outlier tab. This is because the count of 1000 largely deviates from the normal range of 40 (on a higher side). By looking at the Outlier tab, you can infer that perhaps an attacker is trying to attempt a brute force attack to cause denial of service for a large number of user accounts.

The following table describes the information available under the Outlier tab:

Rare events

Rare events occur when the number (or count) of events matching a specific pattern type (that is the common pattern) has occurred less than five times in a given time period. Rare events are those that are considerably different as compared to rest of the data.

Example

Suppose you are monitoring logs that provide information about the features that are most used and least used in an application (with anomaly detection enabled).

In this scenario, you can look at the Rare tab to find features that were least used or never used before.

The following table describes the information available for the rare events:

Changing the matching factor for viewing anomalies

Anomalies are identified based on the coalesced search results. By default, you can view anomalies based on a matching factor of 70%. This means anomalous records are grouped based on at least 70% similarity (in other words, maximum 30% variation).

You can change the matching factor by using the configurematchingfactor CLI command. For more information, see configurematchingfactor-CLI-command.

For more information about matching factor, see Viewing-coalesced-results.