How baselines, thresholds and Key Performance Indicators work together to generate events in the BMC ProactiveNet Operations Console

BMC Software provides many monitoring solutions to help you monitor your environment. A monitoring solution is a pre-defined set of metrics that monitor the health and performance of a specific device or service. For example, BMC PATROL for VMware vSphere is a monitoring solution. BMC monitoring solutions are composed of monitor types and attributes. A monitor type is a way of classifying the data that is to be collected. For example, some of the monitor types included with BMC PATROL for VMware vSphere are VMware Host CPU, VMware Datastore, VMware Disk Performance, VMware Resource Pool, and many others. Each monitor type has attributes that further classify the monitor type into types of data. For example, the attributes for the VMware CPU monitor type are CPU utilization and processor time. If you are familiar with BMC PATROL nomenclature, a monitor type is similar to a Knowledge Module application class and an attribute is similar to an application class parameter. A monitor instance refers to a monitor type on a specific device.

BMC ProactiveNet uses a combination of Key Performance Indicators, thresholds, and baselines to determine when to send an event to the BMC ProactiveNet Operations Console. Events can have a severity level of Critical, Major, Minor, or Informational. The severity of the event is determined by the administrator when thresholds are set and should correlate to the severity of the problem or impending problem.

Key Performance Indicators

Key Performance Indicators (KPIs) are attributes that you and BMC Software have determined have the most impact on the health and performance of your environment and your service-level agreements. BMC Software has configured some KPIs by default for each monitoring solution, such as BMC PATROL for VMware vSphere. For example, the KPI attributes for the VMware Datastore monitor type for BMC PATROL for VMware vSphere are the Free Disk Space attribute and the Datastore disk usage in percent attribute.

An administrator can specify an attribute as a KPI in addition to the default KPIs specified by BMC Software using the BMC ProactiveNet Operations Console. To configure an attribute to be a KPI, you must use the BMC ProactiveNet Administration Console. From the Tools menu, choose KPI Administration and enable the attribute to be a KPI. An above the baseline abnormality threshold is automatically enabled for that attribute.

Baselines

A baseline represents the normal operating parameters for the device or service that you are monitoring. The baseline is calculated by collecting the values for the attributes of a monitor type over a specified time period and establishing a low baseline value (consisting of the 10th percentile of all the values for a given time period) and a high baseline value (consisting of the 90th percentile of all the values for a given time period), taking a weighted average of these values over time. A higher weight is given to the latest data being factored into the baseline average. The accuracy of the baseline improves over time. You need to collect data from an attribute for at least a week to establish a reliable baseline. The longer the attribute has been collecting data, the more reliable the baseline becomes.

BMC ProactiveNet will only generate baselines for attributes that are specified as Key Performance Indicators (KPIs). Data will be collected for attributes that are not KPIs, but attributes that are not KPIs will not have baselines.

BMC ProactiveNet captures the following baseline patterns:

Hourly—Each hour of the day has a high or low value that is tracked. This baseline tracks the pattern for a metric on an hourly basis, and is repeated for each day. An hourly baseline is initialized after the monitor instance is created and 24 hours of data collection has occurred.
Daily—A high or low value is derived from the moving average of each consecutive day. This high or low range is taken from a larger number of data values and consequently will be a wider range than the hourly baseline. A daily baseline is initialized after the monitor instance is created and 24 hours of data collection has occurred.
Weekly—A high or low value is derived from the moving average of each consecutive week. This high or low range is taken from a larger number of data values and consequently will be a wider range than the hourly or daily baselines. A weekly baseline is initialized after the monitor instance is created and 7 24-hour periods of data collection have occurred.
All—A combination of the Hourly, Daily, and Weekly baselines.
Hourly and Daily—A combination of the Hourly baseline and the Daily baseline.
Auto—Allows BMC ProactiveNet to determine the best baseline type for the selected attribute.

By default, Auto baseline is enabled for all KPI monitor type attributes. This means that BMC ProactiveNet automatically detects abnormality in each KPI monitor type attribute and determines the best baseline to be used depending on the behavior of the monitor instance. You can change the baseline type enabled for any KPI to one of the baseline types described above.

Different monitor attributes have different data patterns. For example, for monitor attributes that change frequently, such as CPU usage, a data pattern captured at hourly intervals may be the best way of establishing a baseline. Hourly baselines represent a smaller number of data points and will have a tighter range, which is best suited for capturing frequent changes. Other monitor attributes, such as disk utilization, do not change as frequently. Fewer data points are required to establish a baseline for these attributes, so a daily or weekly baseline might be the best way to capture and identify changes for these attributes.

For baselines to be generated for an attribute, that attribute must have an active abnormality threshold. An active abnormality threshold means that the threshold exists and is not suppressed. Additionally, if the Key Performance Indicator (KPI) mode is active, only those attributes that have an active abnormality threshold and are also KPI attributes will have baselines generated for them. Absolute thresholds (with "outside baseline") or signature thresholds do not satisfy these requirements.

After a reliable baseline has been established, you can use these baselines in conjunction with thresholds to generate events when the data values from a monitor fall either above or below the normal baseline range or threshold for a statistically significant number of points.

Thresholds

Thresholds define acceptable high and low values for the data collected. You can set thresholds for each attribute for each monitor type. Global thresholds are applied to all instances of a monitor type. Instance thresholds are applied only to a specific instance of a monitor type. Instance thresholds take precedence over global thresholds.

BMC ProactiveNet allows you to set abnormality thresholds, signature thresholds, and absolute thresholds. Thresholds can be created as part of a policy that can be applied to multiple monitors types on multiple BMC PATROL Agents.

Abnormality thresholds

Abnormality thresholds are high or low thresholds determined by the specified baseline (Hourly, Daily, Weekly, and so on) data collected by BMC ProactiveNet for an attribute. Abnormality thresholds are enabled by default for Key Performance Indicator (KPI) attributes. Only abnormality thresholds are enabled by default. All other thresholds must be created manually.

If you want to create an abnormality threshold for an attribute that is not a default KPI, you can configure that attribute to be a KPI using the BMC ProactiveNet Administration Console. From the Tools menu, choose KPI Administration and enable the attribute to be a KPI. An above the baseline abnormality threshold is automatically enabled for that attribute. If you want to create a below the baseline abnormality threshold, you must manually create a below the baseline abnormality threshold for that attribute.

If an abnormality threshold is breached, it always generates an event with a severity of Informational in the BMC ProactiveNet Operations Console. You can specify that an event will be generated when the baseline is breached either above or below the baseline.

You can reduce the number of information events generated by an abnormality threshold by specifying any or all of the following conditions when you create the threshold:

Duration—duration is the length of time that the data values must breach the threshold before an event is generated. For example, you could specify that the attribute data value must remain above the baseline for 10 minutes before an event is generated.
Threshold—threshold is an absolute value above and beyond the baseline value. If you set a threshold, an abnormality will be generated only when it violates both the set baseline and the threshold value.
Minimum Sampling Window—the minimum sampling window specifies the number of data points that must be collected or the amount of time that must pass (depending on the attribute) before an event is generated. At least five data points are required before an event is generated for any threshold type. If your minimum sampling window is set too small to allow at least 5 data points, then BMC ProactiveNet will wait until five data points have been collected before generating the event, regardless of the value entered into the sampling window. For example, if you set a minimum sampling window to 10 minutes on a specific monitor attribute but the polling rate of that monitor is 5 minutes, 25 minutes must pass before an abnormality is generated. For this reason, BMC recommends to use lower polling rates for monitors.
Absolute deviation—an absolute deviation is an absolute number above or below a baseline that the collected data values must reach before an event is generated. For example, if you set a daily abnormality threshold for the Total Number of Disk I/O Request attribute for the VMware Disk Performance monitor type, and you set the absolute deviation to 3, then an abnormality event would not be generated until the Daily baseline for the total number of VMware disk I/O requests plus three additional requests had been made. If the Daily baseline for VMware disk I/O requests was 45, then an abnormality event would be generated when 48 VMware disk I/O requests were made.
Percentage deviation—a percentage deviation is the number of percent of the baseline above or below the baseline that the collected data values must reach before an event is generated. For example, if you set a hourly abnormality threshold for for the Average Disk Throughput attribute for the VMware Disk Performance monitor type, and you set the percentage deviation to 10, then an abnormality event would not be generated until the value of the data collected reached the Hourly baseline for the average VMware disk throughput plus an additional 10 percent of the baseline value. If the Hourly baseline for average VMware disk throughput was 2500 KB per second, then an abnormality event would be generated when the VMware disk throughput became 2750 KB per second.

Abnormalities are closed automatically when the number of data points exceeded in the last minimum sampling window size is not considered significant. For example, if six out of seven data points out of range are statistically significant, then as soon as the last exceeded points drop to five out of seven points, the abnormality will be closed. By default, even if no explicit global or instance signature threshold is set, BMC ProactiveNet will generate abnormalities for above baseline conditions.

Signature thresholds

Signature thresholds are dynamic thresholds that an administrator can set that use the baseline data collected by BMC ProactiveNet for a KPI attribute as the threshold value. The baseline itself forms the signature threshold. Because signature thresholds use the learned behavior of the service or device, the signature thresholds change as the service or device attributes change. The longer a signature threshold is in place, the more accurate it becomes.

When a signature threshold is breached, it sends an event to the BMC ProactiveNet Operations Console. This event indicates the severity of the threshold breach. An administrator can define the severity levels as Critical, Major, or Minor when the threshold is set. Use signature thresholds if you want to signify that this attribute is so important that if it breaches the baseline that you want to generate an event that is not informational. Signature thresholds are useful for performance attributes such as response time, utilization, and errors.

Absolute thresholds

Absolute thresholds are static thresholds set by an administrator. You can specify that a breach occurs when the data value is greater than, less than, greater than or equal to, less than or equal to, or equal to the specified threshold value. When an absolute threshold is breached, it generates an event in the BMC ProactiveNet Operations Console (Critical, Major, or Minor). You can enable an absolute threshold with any of the baseline types (Hourly, Daily, Weekly, and so on). For example, you can enable an absolute threshold to use the Weekly baseline. An absolute threshold with this configuration would not generate an event until the Weekly baseline had been breached in addition to the absolute threshold value. Combining the absolute threshold with a baselines reduces the number of events received by the BMC ProactiveNet Console. If the absolute threshold is not enabled with a baseline, then an event is generated whenever the threshold value is breached, regardless of the baseline.

You can reduce the number of events generated by an absolute threshold by specifying any of the following conditions when you create the absolute threshold:

Duration—duration is the length of time that the data values must breach the threshold before an event is generated. For example, you could specify that the attribute data value must remain above the baseline for 10 minutes before an event is generated.
Outside baseline enabled—By selecting a baseline to combine with the absolute threshold, you ensure that an event is not generate unless the data value breaches both the threshold value AND the selected baseline (Hourly, Daily, Weekly, Daily and Hourly, or All baselines).

If desired, when you create an absolute threshold, you can specify that you want an early-warning predictive event to be generated 2-3 hours before the absolute threshold is breached. the threshold should be used by BMC ProactiveNet to predict possible issues. Can warn you 2-3 hours in advance before the absolute threshold is breached. An early-warning predictive event is generated. Disk space slowly fills up over time, the predictive event will give you 2-3 hours lead time before the threshold is breached.

When the administer is creating an absolute threshold, he or she can specify that the events generated by a breach of that threshold can be automatically closed if the data values collected no longer are in breach of the threshold. If Auto Close is not enabled, events generated by a breach of an absolute threshold can be closed manually by a BMC ProactiveNet Operator in the BMC ProactiveNet Operations Console.

Monitoring policies

Using Central Monitoring Administration, you can define and manage monitoring policies. A monitoring policy defines a set of selection criteria that is applied to incoming data to determine which data are processed and how the selected data are processed. Monitoring policies are defined in Central Monitoring Administration and are applied to PATROL Agents that match the agent conditions.

A monitoring policy allows you to perform the following tasks:

Specify general policy details, such as the policy name. For instructions, see Defining general policy details.
Configure the VMware vSphere monitor. During this process, you will specify the details for the vCenter or ESX Server environment that you want to monitor. You can specify whether you are using vCenter or ESX Server in an enterprise environment or a Cloud environment. For instructions, see Configuring a monitor.
Configure the filters to identify the agents on which the policy can be applied. For instructions, see Configure filter options.
Set agent thresholds for the VMware vSphere monitor type attributes. For instructions about setting monitor thresholds, see Configuring BMC PATROL Agent thresholds.
Set server thresholds for the monitor instances on the BMC ProactiveNet Server. For instructions, see Configuring a server threshold.
Configure the properties of the BMC PATROL Agent and specify the action that the BMC PATROL Agent must perform when the policy is applied. For information, see Configuring a BMC PATROL Agent.
Specify the action that the BMC ProactiveNet Servers must perform when the policy is applied. For information, see Specifying server configuration.

After the policy has been created, it automatically is applied to all BMC PATROL Agents that match the agent conditions.

Page tree