Getting started with anomaly detection
Anomaly detection involves identification of unusual patterns in the data that you are monitoring.
This topic provides the following information about the anomaly detection capability:
What is anomaly detection?
Just before a problem occurs, you can typically observe various symptoms, for example, a sudden spike or dip in the event frequency or rare occurrences of events. Such changes in the data being monitored offer important clues to potential problems. Anomaly detection is the identification of such unusual patterns in the data that you are monitoring.
Anomaly detection can enable you to:
- Resolve problems faster by discovering the root cause faster
- Detect problems before they have significant impact
- Confirm expected behavior for changes applied to the system
By detecting anomalies, you can add a new level of confidence to your activities. Suppose you upgrade to a new version of a software, by looking at anomalies you can check if there are any unusual changes in the event pattern.
How does anomaly detection work?
IT Data Analytics uses a combination of unsupervised machine learning techniques and baselining to detect anomalies.
After installing (or upgrading) the product, it takes a few days to learn common patterns in the data that is indexed. Over time, the system is trained to detect deviations and infrequent occurrences in the data as compared to common patterns learned. Deviations are shown as out-of-range events and the infrequent happenings are shown as rare events. With time, patterns are constantly adjusted and you can expect to find more accurate results. For more information, see Detecting anomalies.
Use cases for detecting anomalies
The following use cases illustrate how you can use anomaly detection to get additional insight into the data that you are monitoring or analyzing.
Root cause analysis
While identifying potential causes of an application event or infrastructure alarm, you can use anomaly detection to isolate log messages which are rare or which are occurring at an unusual frequency. Suppose an alarm is raised via TrueSight Infrastructure Management (or any other monitoring tool) and is assigned to an owner who has the responsibility of resolving the issue. In cases where the resolution is not found, a simple or common method like restarting the application is often used. Now, you can use anomaly detection to get to the root cause of the alarm quickly and more accurately.
As part of your troubleshooting, you can use various other features like search commands, coalesce data capability, or compare data capability. But anomaly detection is designed specifically to help you focus only on rare or unusual logging which was recorded at a specific time. While troubleshooting, you can isolate a window of time close to when the alarm was raised and look at a quick summary of log messages which are rare or occurring at an unexpected frequency in that same time window. Often, this can help you get additional insight on the problem.
Proactive change management
As part of a change process, administrators who apply new software (or make configuration changes) can add another level of confidence to their process by validating that the change they just applied is not causing an unexpected behavior.
As an administrator, you can use anomaly detection to isolate and view unexpected or rare log messages around the same time or just after a change was applied. This can give you an early warning if the change caused a negative or unexpected behavior in the application.