Viewing anomaly events
If you have enabled autoanomaly event generation, the system generates anomaly events when a metric value breaches all the baselines and metric deviation (in sigma) value for a specific duration as shown in the following images:
If you have enabled the auto-close option in the variate policy settings, the system closes the anomaly event after the metric value returns to a normal state. The event also closes after you perform one of the following actions:
- Change the metrics associated with the variate policy
- Delete variate policy
- Delete the PATROL Agent associated with the variate policy
- Enable the auto-close option in the variate policy settings
If the auto-close option is not enabled in the variate policy settings, you must manually close an autoanomaly event.
To know the time when an autoanomaly event was closed, perform the following steps:
- From the Events page, select the Closed filter to view the list of all closed events.
- Click the anomaly event icon
to view the Anomaly Details page.
- Hover over the green color icon to view the event closed time as shown in the following screenshot. For more information, see Viewing-event-details.
Anomaly details
Under the Events page under Monitoring, click the anomaly event icon , which is available only for the events belonging to the multivariate policy, to view the Anomaly Details page. The Anomaly Details page shows an anomaly event graph with data points for each metric.
Note: You can customize the Events page table view to show/hide or reorder the columns specific to anomaly events.
Here is an annotated screenshot of the Anomaly Details page:
You can see the following variate policy details on the page:
- Variate policy details: Name, Severity (Minor, Major, or Critical), Priority, Status (Open or Closed), Occurred (at time stamp), Modified (at time stamp)
- First metric and graph: For a multivariate policy, the metric with the highest contribution to the anomaly score is displayed on top.
- List of all other contributing metrics with a graph for each metric are displayed.
- Metric with the least contribution to the anomaly score is displayed at the bottom.
- Click Show Non-contributing Metrics to view the non-contributing metrics. This expander is not shown if all the top 10 metrics in the policy are contributing to the anomaly score. For more information, see Understanding multivariate anomaly score.
- Band of normality: A grey band that helps you to visualize that the data points are within the normal distribution range.
- Anomaly indicator: Indicates the point at which the anomaly occurred.
- Anomaly duration indicator: The graphical indicator of the time duration specified in the policy. An anomaly event is generated only if the anomaly persists for the specified duration.
- Hover texts: Displays the metric and anomaly details in text bubbles.
Managing autoanomaly generation
As an administrator, use the Manage Autoanomaly Generation option on the Events page to minimize anomaly events and reduce event noise. When you configure autoanomaly generation for a metric by selecting a single or multiple events, the configuration is applied to the metric type. This means that all anomaly events that contain the metric are affected.
For example, one of the monitoring devices has low CPU usage, and you do not want to generate an anomaly event for this. Use the Manage Autoanomaly Generation option to generate autoanomalies only when the CPU usage is high. This setting applies to the CPU usage metric type used in all anomaly events.
To manage autoanomaly generation
- In , select Monitoring > Events.
- Click the action menu of any autoanomaly event and select Manage Autoanomaly Generation.
- In the Manage Autoanomaly Generation window, select one of the following options:
- Generate autoanomaly events only for low baseline violation.
- Generate autoanomaly events only for high baseline violation.
- Don't generate autonaomaly events.
- Click Apply.
To update the settings for autoanomaly generation, use the Manage Metrics option on the Manage Auto Anomaly Event Generation page. For more information, see Manage Metrics.
Understanding the multivariate anomaly score
In a multivariate policy, all the metrics configured in the policy are analyzed together, and a single anomaly is detected. The anomaly events are displayed in a stacked graphical format.
- If there are more than 10 metrics in the policy, only the top 10 metrics that contribute to the anomaly are displayed in the Anomaly Details page.
- If there are multiple metrics contributing to the anomaly score, they are displayed from highest contributor on top to the least contributor at the bottom.
If there are some metrics within the top 10 that are not contributing to the anomaly score, they are contained within an expander (Shown Non-contributing Metrics). You can expand to view those metric details.
For example, if you have configured a policy with 10 different metrics (as shown in the figure), all of them are analyzed together, and a single anomaly event is generated. All the 10 metrics are analyzed at the same time, and a single anomaly score is computed.
In this example, the anomaly score for the anomaly event is 1.6976399. The following table shows the contribution from each metric to arrive at the anomaly score:
- The metric in row 10 (Total) is a non-contributor as its score is zero.
- The metric in row 2 (Used) in the table has a spike value of ~ 10.47, but at the same time, all the other metrics are behaving normally. Hence, it is not an anomaly data point.
- The metrics from rows 5 to 9 in the table are only contributing minimally to the overall abnormality score.
Metric identifier
Metric score
Contributor
Metric graph
__name__=vmUsed,entityId=a4c0e83f-ac6f-497b-86cd-c646b90d7f89:NUK_Memory:
NUK_Memory,hostname=ai-ml-host94.abc.com0.6220732844489766
✅️
__name__=Used,entityId=a4c0e83f-ac6f-497b-86cd-c646b90d7f89:NUK_Memory:
NUK_Memory,hostname=ai-ml-host94.abc.com0.5837307306884533
✅️
__name__=Free,entityId=a4c0e83f-ac6f-497b-86cd-c646b90d7f89:NUK_Memory:
NUK_Memory,hostname=ai-ml-host94.abc.com0.5813340459335296
✅️
__name__=vmUtilization,entityId=a4c0e83f-ac6f-497b-86cd-c646b90d7f89:NUK_CPU:
NUK_CPU,hostname=ai-ml-host94.abc.com0.0203010773791622
✅️
__name__=Utilization,entityId=a4c0e83f-ac6f-497b-86cd-c646b90d7f89:NUK_CPU:
NUK_CPU,hostname=ai-ml-host94.abc.com0.005336915030485834
✅️
__name__=Load,entityId=6ad14b2c-c69f-446c-8053-c0153b0f6043:NUK_CPU:
NUK_CPU,hostname=ai-ml-host94.abc.com0.0013492750500503055
✅️
__name__=vmUtilization,entityId=6ad14b2c-c69f-446c-8053-c0153b0f6043:NUK_CPU:
NUK_CPU,hostname=ml-ai-host84.abc.com0.0010230399553819848
✅️
__name__=IdleTime,entityId=6ad14b2c-c69f-446c-8053-c0153b0f6043:NUK_CPU:
NUK_CPU,hostname=ai-ml-host94.abc.com0.0010230394857828524
✅️
__name__=Utilization,entityId=6ad14b2c-c69f-446c-8053-c0153b0f6043:NUK_CPU:
NUK_CPU,hostname=ml-ai-host84.abc.com0.0002797324774806881
✅️
__name__=Total,entityId=a4c0e83f-ac6f-497b-86cd-c646b90d7f89:NUK_Memory:
NUK_Memory,hostname=ai-ml-host94.abc.com0.0
❌️