Page tree


Detecting anomalies can help you find unusual patterns in the data that you are monitoring. Anomalies not only tell you if there is a change in the usual pattern. It also tells you how much the new pattern deviates from the usual pattern both visually and numerically. Additionally, it shows you infrequent occurrences of events.


This information can help you gain important insights that in turn help you uncover new problems and detect the root cause of a problem more quickly.

To get an overview and understand the use cases for viewing anomalies, see Getting started with anomaly detection.

This topic provides the following information about viewing and understanding anomalies:

Enabling anomaly detection

By default, anomaly detection is disabled. You need to manually enable a setting to start anomaly detection. You need to enable anomaly detection only in the Console Server.

Note

Anomaly detection requires additional resources (in terms of CPU and maximum Java heap size). Therefore, before you enable anomaly detection in a production environment, BMC recommends you to contact BMC Support to help you tune your setup and get optimal product performance.

To enable or disable anomaly detection

To enable or disable anomaly detection from the Console Server follow these steps:

  1. Navigate to the olaengineCustomConfig.properties file and edit it.
    For more information, see Modifying the configuration files.
  2. If not already present, add the property, baseline.enabled.
  3. Proceed in one of the following ways:
    • To enable anomaly detection, change the property value to true.
    • To disable anomaly detection, ensure that the property value is set to false (default value).
    For example, baseline.enabled=true.
  4. Restart the Console Server.
    For more information see Starting or stopping product services.

Viewing anomalies

Anomalies can be viewed in context of a search. To view anomalies, you do not require any specific domain knowledge about the data being monitored. For example, searching for a specific error or warning message is not required to view anomalies in a given time period. Typically, just a time range and optionally a tag value is sufficient to isolate anomalies. This means when you perform a search, you can select a time range and a tag value and based on this search context, you can see anomalies. For more information about tags, see Understanding tags.

Anomaly detection takes into consideration the hosts available under the Filters panel (during the search). When you view anomalies, the host information is identified and displayed next to the anomalous events to enable you to get better insights.

To view anomalies

  1. Ensure that anomaly detection is already enabled.
  2. Navigate to the Search tab and perform a search.
  3. On the top-left of the page, click the vertical three dots (indicating a menu) next to All Data and select Analyze Data.
    The Coalesce page is displayed.
  4. From the Coalesce page, turn on the Anomalies setting to view anomalies.
    This results in a summary of anomalous events categorized as out-of-range events and rare events.

Understanding how anomalies are detected

If you try to view anomalies for any new type of data collected, all the events (coming from that data) might be indicated as infrequent occurrences (or rare events). For example, if you try to view anomalies after enabling anomaly detection for the first time or after adding a new type of data collector, it is possible that initially all the events are indicated as rare events. This is because the product is still forming the common pattern based on which anomalies are detected. Over time, the pattern is adjusted and slowly you can expect to see more accurate results.

By default, the anomaly job is run every hour. This means if you run a search for a period just after the anomaly job was last run, it is possible that you do not see current results. In such a scenario, a message indicating the same is displayed on the Anomaly page.

Anomalies are detected based on the common patterns learnt. To learn common patterns, IT Data Analytics isolates a window of seven days before search time. This means when you search, anomalies are detected based on the common pattern formed in the last seven days before the search time context. For example, if you search for the last 24 hours, anomalies are detected based on the last seven days before the last 24 hours search time context.

The common patterns learnt are preserved in the system, until the expiry of the maximum data retention period (available under Administration > System Settings) and for an additional seven days. For example, if your maximum data retention period is set to 14 days, then the common patterns learnt are preserved in the system for 14 + 7 equals 21 days.

Based on the number of times the common pattern occurs, a normal range is established by the product. Any event falling out of this range is indicated as out-of-range events and any event that cannot be included in the range (because that event was never seen before or seen infrequently) are indicated as rare events.

Out-of-range events

Out-of-range events occur when the number (or count) of events matching a specific pattern type (that is the common pattern) is outside the normal range for a given period.

For example, suppose 20-30 events matching pattern X usually occur. But in the last one hour, 200 events of pattern X occurred. Such an instance would be considered as an out-of-range event. The current pattern of events (200) deviates from the common pattern (20-30) on a higher side. Therefore, this deviation is visually plotted on the bar (representing the deviation factor) under the Deviation column, on the right side of the splitting line.

Example

Suppose you are monitoring logs that provide information about daily logins and user activity for an application (with anomaly detection enabled). Suppose the normal range for the common pattern identified for this kind of information is 40. This means around 40 users log on to the application on a daily basis.

Scenario 1: If on a particular day, there are only 10 users that have logged on to the application, this will be indicated as an anomaly on the Out Of Range tab. This is because the count of 10 deviates from the normal range of 40 (on a lower side). By looking at the Out Of Range tab, you can infer that perhaps some users are facing issues in logging on to the application or it is possible that many people working on the application are on leave.

Scenario 2: If on a particular day, there are 1000 attempts to log on to the application, this will be indicated as an anomaly on the Out Of Range tab. This is because the count of 1000 largely deviates from the normal range of 40 (on a higher side). By looking at the Out Of Range tab, you can infer that perhaps an attacker is trying to attempt a brute force attack to cause denial of service for a large number of user accounts.

The following table describes the information available under the Out Of Range tab:

Column nameDescription
Deviation

The bar displayed under this column visually represents the deviation factor.

The deviation factor tells you whether the anomalous record deviates from the common pattern on the lower side or the higher side and the amount by which the deviation occurs –  this means how far the anomalous records lie as compared to the normal range.

The bar representing the deviation factor is split into two parts by a vertical line. If the deviation is on the higher side, the right side of the line is colored and if the deviation is on the lower side, the left side of the line is colored (as shown in the following figure).

The amount by which the deviation occurs is indicated by the portion of the bar colored after the splitting line and the deviation factor represented in terms of percentage on top of the bar. This visual representation can also help you compare deviations across various anomalous records.

When you click the bar or the numeric figure (deviation factor) on top of the bar, by default, you can see the top ten records occurring in the coalesced anomalous record (as shown in the following figure).

To see the bottom ten records, click the three vertical dots menu next to Top Records, and select Show Bottom 10 (by count). Conversely, to return to the top ten records list, click the three vertical dots menu and select Show Top 10 (by count). To search with the message of the individual record and see the results arising, click Launch Search next to the message.

You can also sort the anomalous records in an ascending or descending way by selecting the Sort Ascending or Sort Descending option from the three dots menu on this column (as shown in the following figure). Based on the sorting order that you select, an associated arrow is displayed next to the column name.

Hosts

The host information corresponds to the search query based on which the anomalies are detected.

The hosts selected under the Filters panel during the search determine the hosts displayed next to the anomalous records. This information can provide you better insights about the anomalous records and help you take better decisions.

You can narrow down the anomalous records displayed by selecting hosts from the three dots menu on this column.

Anomalous records

Because anomalies are detected based on the coalesced results, in the table, you can see that the anomalous records display only the common (coalesced) message with ellipsis (...) (as shown in the following figure). The ellipsis indicates varying portions in the message. This coalesced message gives you a high-level idea of the records deviating from the normal range.

To drill down into the common (or coalesced) message, click the bar (right portion) displayed under the Deviation column or click the numeric figure (deviation factor) displayed on top of the bar, next to the anomalous record. By drilling down, you can see the most common (top ten records) and the least common (bottom ten records) occurring for the coalesced anomalous record.

Launch Search

Runs a search with the anomalous record message and see results on the All Data page.


Rare events

Rare events occur when the number (or count) of events matching a specific pattern type (that is the common pattern) has occurred less than five times in a given time period. Rare events are those that are considerably different as compared to rest of the data.

Example

Suppose you are monitoring logs that provide information about the features that are most used and least used in an application (with anomaly detection enabled).

In this scenario, you can look at the Rare tab to find features that were least used or never used before.

The following table describes the information available for the rare events:

Column nameDescription
Previous occurrences

The bar displayed under this column visually represents the count of the rare event occurrences. This count is also represented on top of the bar, for example, Never seen before, Seen 1 time, and so on.

The bar displayed is colored based on the count of rare occurrences. If the count is high, a larger portion of the bar is colored and vice versa. This visual representation can help you compare the count across other rare events (anomalous records).

Note that the count of rare occurrences refers to the number of times the same pattern (anomalous message) was seen in the last seven days before search time. The messages matching the common pattern might have occurred for a 1000 times, however, if this pattern has been seen on only two occasions in the last seven days, then the number of previous occurrences is represented as Seen 2 times (as shown in the following figure).

When you click the bar or the notation of the previous occurrence on top of the bar (for example, Seen 1 time), by default, you can see the top ten records occurring in the coalesced anomalous record (as shown in the following figure). To see the bottom ten records, click the three vertical dots menu next to Top Records, and select Show Bottom 10 (by count). Conversely, to return to the top ten records list, click the three vertical dots menu and select Show Top 10 (by count). To search with the message of the individual record and see the results arising, click Launch Search next to the message.

You can also sort the anomalous records in an ascending or descending way by selecting the Sort Ascending or Sort Descending option from the three dots menu on this column (as shown in the following figure). Based on the sorting order that you select, an associated arrow is displayed next to the column name.

Hosts

The host information corresponds to the search query based on which the anomalies are detected.

The hosts selected under the Filters panel during the search determine the hosts displayed next to the anomalous records. This information can provide you better insights about the anomalous records and help you take better decisions.

The hosts selected under the Filters panel during the search determine the hosts displayed next to the anomalous records. This information can provide you better insights about the anomalous records and help you take better decisions.

You can narrow down the anomalous records displayed by selecting hosts from the three dots menu on this column.

Anomalous records

Because anomalies are detected based on the coalesced results, in the table, you can see that the anomalous records display only the common (coalesced) message with ellipsis (...) (as shown in the following figure). The ellipsis indicates varying portions in the message.

The common message gives you a high-level idea of the anomalous event. To drill down into the common (or coalesced) message, click the bar (right portion) displayed under the Deviation column or click the numeric figure (deviation factor) displayed on top of the bar, next to the anomalous record. By drilling down, you can see the most common (top ten records) and the least common (bottom ten records) occurring for the coalesced anomalous record.

Launch Search

Runs a search with the anomalous record message and see results on the All Data page.


Changing the matching factor for viewing anomalies

Anomalies are identified based on the coalesced search results. By default, you can view anomalies based on a matching factor of 70%. This means anomalous records are grouped based on at least 70% similarity (in other words, maximum 30% variation), which is same as the coalescence factor. The common pattern (or baseline) used for detecting anomalies also matches incoming data based on a 70% similarity (same as the coalescence factor).

Therefore, changing the coalescence factor can not only affect the coalesced results, but also what you see as anomalies. Changing the coalescence factor results in the matching factor set at the same level. For example, if the coalescence factor is set at 80% similarity, the matching factor (for anomalies) is automatically set at 80%. You can further change the matching factor and override the automatic setting. However, to be able to view more accurate results, it is recommended that you keep the coalescence factor and the matching factor at the same level.

The matching factor works in the same way as the coalescence factor. As you move the slider from left to right, the variation used to group anomalous records becomes lesser (as shown in the following figure). In other words, the anomalous records (which are coalesced records) contain individual records that are more similar. Therefore, when you move the slider towards the right, you can see more results as compared to moving the slider towards the left. For more information about the coalescence factor, see Viewing coalesced results.

3 Comments

  1.  

    1.  

      1.