BMC ProactiveNet thresholds are those elements that generate events based on the infrastructure performance data. This document describes the various types of thresholds, their interaction with the BMC ProactiveNet environment, its usages, and some management practices.
Types of thresholds
There are three types of thresholds that are native to the BMC ProactiveNet Server: Absolute, Signature, and Abnormality thresholds. In addition, BMC ProactiveNet PATROL KMs have its own thresholds, which are not directly managed by the BMC ProactiveNet Server, but which are capable of generating events that integrate with BMC ProactiveNet Server Analytics.
The term Intelligent Threshold is used liberally throughout product documentation. It simply means the usage of baselines in any of the three native BMC ProactiveNet threshold types.
These are thresholds that are straight-forward to understand, where in order for an event to get generated, one hundred percent of the performance data points have to satisfy the trigger condition.
A trigger condition is defined to be:
- Having data points that satisfy a defined "threshold value", which is the maximum of:
- Threshold value
- Baseline with applicable padding factors
- A duration (0 for immediate) during which the above condition remains true.
Absolute thresholds' event-triggering behaviors are easier to grasp because of the requirement that all performance data points must exceed the trigger condition for a specified duration before an event gets created. The absence of the trigger condition for the same duration will close the event.
Having the baseline in the trigger condition allows a threshold to generate events based on learned behavior.
The following figure illustrates a typical trigger condition for an absolute threshold.
Absolute thresholds with outside baseline enabled
Absolute thresholds with outside baseline are thresholds which combine both absolute thresholds (static threshold value) and dynamic thresholds (which use the baseline) as the threshold. The user can still set the threshold value with but as additional criteria for alarm generation.
- Reduce the number of false alarms when compared with using static threshold.
- Absolute thresholds with outside baseline allows more control than signature thresholds.
Absolutes with no baselines are for the availability metrics, or other metrics that have discrete, state based values; that is, 0=OK, 1=Down, 2=Admin Down.
Absolutes with baselines are for user influenced performance metrics with normalized values (0-100) or with set definitions of values for CPU utilization, memory utilization, process util, etc.
Where Absolute thresholds tend to fail to create events are scenarios where the performance data take on a more transient behavior, rendering the trigger condition less obvious. Signature thresholds can be utilized here because it can make a judgment when enough data points are needed to trigger an event, as illustrated below.
Due to the non-deterministic nature of the performance data, it might take longer than anticipated for a Signature threshold to generate an event.
The difference between Absolute and Signature thresholds is in the percentage of data points that must meet the trigger condition. 100% for Absolute thresholds and just enough for Signature thresholds.
Signature threshold for performance metrics that have no set concept of value is completely dependent on the attribute and the instances (transaction response time, Ping Response time, etc) being monitored.
For example, if the ping response time is 900ms; the Signature threshold is poor if it is to a box in the same data center across a gigabit switch. The Signature threshold is considered good if it is to a box across the continent.
Absolute and Signature thresholds generate actionable events for those events with severity of MINOR, MAJOR, or CRITICAL, Abnormality thresholds create events that are meant to be invisible and labeled with severity INFO. Otherwise, Abnormality thresholds behave identically to Signature thresholds.
These are thresholds that are implemented by the PATROL KMs directly. Events get generated by the KMs and get forwarded directly to the BMC ProactiveNet Server.
Internal events and their life cycles
Events that are generated by the BMC ProactiveNet Server (by the three native thresholds) are called Internal events. They are marked with icons having a double wrench. The key characteristics of Internal events are:
- Thresholds of the same type (Absolute, Signature, or Abnormality) for a metric of an instance operate on the same event, A.
- Event A has a long life span, maintaining the same
- Besides creations and closures, state changes such as severity demotion/promotion are also legitimate changes for event A.
Define three thresholds for the attribute named Response Time.
- Response Time > 5 sec : MINOR
- Response Time > 10 sec : MAJOR
- Response Time > 20 sec : CRITICAL
Over time, all three thresholds trigger one after the other with their respective severity, in the order of increasing severity (MINOR > MAJOR > CRITICAL) and then, they release in decreasing order of severity. The sequence of action over time is:
- Response Time = 6. Threshold A creates event with
mc_ueid, dev1-alr-3322 with MINOR severity.
- Response Time = 11. Threshold B modifies dev1-alr-3322 to severity MAJOR.
- Response Time = 25. Threshold C modifies dev1-alr-3322 to severity CRITICAL.
- Response Time = 12. Threshold C closes. Threshold B modifies dev1-alr-3322 back to MAJOR.
- Response Time = 8. Threshold B closes. Threshold A modifies dev1-alr-3322 back to MINOR.
- Response Time = 1. Thresholds A closes. Severity left unchanged.
All events that are not generated by the BMC ProactiveNet Server's native thresholds are considered external and marked with icons having a single wrench.
When to use thresholds
Metrics that show unambiguous behavior such as up/down, availability, or capacity violation are good candidates for Absolute thresholds.
Signature thresholds are more appropriate for metrics that are more transient, such as response time, packet per second, and so on, where the data can exhibit big swings against a generally upward/downward trend.
PATROL thresholds are appropriate to use for scenarios that are clear-cut, such as device availability, critical capacity overloads, and so on, which do not need to make use of advanced, server-side analytic capabilities. PATROL thresholds have the advantage of quickly triggering actionable events without having to wait for the data to be collected and passed along to the BMC ProactiveNet Server.
Abnormality thresholds generate informational events when key metrics go into exceptional states. These events become useful during troubleshooting scenarios using Probably Cause Analysis (PCA), but are otherwise ignored.
By default, you do not have to do any customization for Abnormality thresholds since they have already been created for all KPIs. However, if you customize the KPI list, ensure that you create new Abnormality thresholds.
KPIs, baselines, and thresholds
Baseline generation requirements
Key Performance Indicators (KPIs) are essential metrics for monitoring an infrastructure. They have a direct impact on whether or not baseline computation takes place for corresponding metrics. The following figures show how KPIs may affect baseline generation where the checked boxes indicate that baseline generation gets carried out for those combinations.
When KPI mode is active (BL only for KPI)
Have Abnormality Thresholds
No Abnormality Thresholds
When KPI mode is not active (BL for all metrics)
Have Abnormality Thresholds
No Abnormality Thresholds
In order to function correctly, Abnormality and Signature thresholds require baseline data. Due to this, you may face support issues as certain thresholds would not work. In such cases, ensure that you check whether the baseline is being generated for the metrics in question.
Which Baseline to Use
The BMC ProactiveNet Server automatically computes three different types of baselines (Hourly, Daily, and Weekly) to be used by the thresholds. In most cases, when defining thresholds, it is adequate to use Auto Baseline, where the BMC ProactiveNet Server determines the best type of baseline to use for any given metric.
However, if it is known that certain metrics have clear, repeatable hourly patterns (for example, 10 AM on Tuesday behaves in the same way as 10 AM on Wednesday), then you can select Hourly Baseline as the baseline type to use for those corresponding thresholds. Similarly, Daily and Weekly baselines can be used by thresholds if you know that their metrics behave accordingly.
This feature is useful for infrastructures that have recurring periods where (part of) the infrastructure behaves very differently and that they do not want these behaviors to be factored into the normal baselines, for example, major back-up on the last Friday of every month, financial number crunching at the end of every quarter, and so on.
In order for Seasonality baselines to work properly, the BMC ProactiveNet administrator has to ensure that the baseline retention period properly reflects the special recurring period. For example, if the recurring period is twelve-month long, the baseline retention period has to be just as long.
Contact BMC support when retention periods are extended, as they can severely degrade BMC ProactiveNet's system performance.
How data affect thresholds
Thresholds are data-driven – the more available data points, the sooner thresholds can generate some events, especially those that make use of baselines. However, frequent polling intervals will increase the BMC ProactiveNet Server's system load.
Advanced Signature threshold configuration
When creating a Signature threshold, it is desirable to fine-tune the behavior of the threshold. As shown in the following image, there are four additional fields that become visible when you select the advanced view when creating Signature thresholds.
The field descriptions are:
Description and Usage
Minimum Sampling Window
The minimum span of time, as marked by collected data points, required in order for the Signature threshold engine to initiate evaluation.
Specify this value if you do not want an algorithm to trigger on trivial conditions. For example, if the baseline is low (around 3% - 5%), specify a high threshold value so that Signature thresholds will only trigger if data values are higher than 80% and surpassing baseline.
Use this to expand the baseline value range. Typically, to reduce the sensitivity of the Signature threshold.
The Prediction feature gives early warnings of certain exceptional situations. It is used to issue warnings if there is an aggressive trend towards the threshold. While it can be used in a wide number of scenarios, it is most effective in capacity-type scenarios, especially for those metrics which exhibit clear hourly patterns.
Some of the points to remember when using this feature are:
- Only applicable to Absolute thresholds
- Baseline is required
- Will need to increase polling interval - easier and more timely to establish trends
- Will use up more run-time resources. Have to examine load if large number of thresholds enables Prediction
How to Persist Thresholds
pw threshold checkpoint command to save states of threshold while customizing deployment.