BMC Software provides many monitoring solutions to help you monitor your environment. A monitoring solution is a pre-defined set of metrics that monitor the health and performance of a specific device or service. For example, BMC PATROL for VMware vSphere is a monitoring solution. BMC monitoring solutions are composed of monitor types and attributes. A monitor type is a way of classifying the data that is to be collected. For example, some of the monitor types included with BMC PATROL for VMware vSphere are VMware Host CPU, VMware Datastore, VMware Disk Performance, VMware Resource Pool, and many others. Each monitor type has attributes that further classify the monitor type into types of data. For example, the attributes for the VMware CPU monitor type are CPU utilization and processor time. If you are familiar with BMC PATROL nomenclature, a monitor type is similar to a Knowledge Module application class and an attribute is similar to an application class parameter. A monitor instance refers to a monitor type on a specific device.
Infrastructure Management uses a combination of Key Performance Indicators, thresholds, and baselines to determine when to send an event to the operator console. Events can have a severity level of Critical, Major, Minor, or Informational. The severity of the event is determined by the administrator when thresholds are set and should correlate to the severity of the problem or impending problem.
Key Performance Indicators (KPIs) are attributes that you and BMC Software have determined have the most impact on the health and performance of your environment and your service-level agreements. BMC Software has configured some KPIs by default for each monitoring solution, such as BMC PATROL for VMware vSphere. For example, the KPI attributes for the VMware Datastore monitor type for BMC PATROL for VMware vSphere are the Free Disk Space attribute and the Datastore disk usage in percent attribute.
An administrator can specify an attribute as a KPI in addition to the default KPIs specified by BMC Software using the operator console. To configure an attribute to be a KPI, you must use the administration console. From the Tools menu, choose KPI Administration and enable the attribute to be a KPI. An above the baseline abnormality threshold is automatically enabled for that attribute.
A baseline represents the normal operating parameters for the device or service that you are monitoring. The baseline is calculated by collecting the values for the attributes of a monitor type over a specified time period and establishing a low baseline value (consisting of the 10th percentile of all the values for a given time period) and a high baseline value (consisting of the 90th percentile of all the values for a given time period), taking a weighted average of these values over time. A higher weight is given to the latest data being factored into the baseline average. The accuracy of the baseline improves over time. You need to collect data from an attribute for at least a week to establish a reliable baseline. The longer the attribute has been collecting data, the more reliable the baseline becomes.
Infrastructure Management will only generate baselines for attributes that are specified as Key Performance Indicators (KPIs). Data will be collected for attributes that are not KPIs, but attributes that are not KPIs will not have baselines.
Infrastructure Management captures the following baseline patterns:
By default, Auto baseline is enabled for all KPI monitor type attributes. This means that Infrastructure Management automatically detects abnormality in each KPI monitor type attribute and determines the best baseline to be used depending on the behavior of the monitor instance. You can change the baseline type enabled for any KPI to one of the baseline types described above.
Different monitor attributes have different data patterns. For example, for monitor attributes that change frequently, such as CPU usage, a data pattern captured at hourly intervals may be the best way of establishing a baseline. Hourly baselines represent a smaller number of data points and will have a tighter range, which is best suited for capturing frequent changes. Other monitor attributes, such as disk utilization, do not change as frequently. Fewer data points are required to establish a baseline for these attributes, so a daily or weekly baseline might be the best way to capture and identify changes for these attributes.
For baselines to be generated for an attribute, that attribute must have an active abnormality threshold. An active abnormality threshold means that the threshold exists and is not suppressed. Additionally, if the Key Performance Indicator (KPI) mode is active, only those attributes that have an active abnormality threshold and are also KPI attributes will have baselines generated for them. Absolute thresholds (with "outside baseline") or signature thresholds do not satisfy these requirements.
After a reliable baseline has been established, you can use these baselines in conjunction with thresholds to generate events when the data values from a monitor fall either above or below the normal baseline range or threshold for a statistically significant number of points.
Thresholds define acceptable high and low values for the data collected. You can set thresholds for each attribute for each monitor type. Global thresholds are applied to all instances of a monitor type. Instance thresholds are applied only to a specific instance of a monitor type. Instance thresholds take precedence over global thresholds.
Infrastructure Management allows you to set abnormality thresholds, signature thresholds, and absolute thresholds. Thresholds can be created as part of a policy that can be applied to multiple monitors types on multiple BMC PATROL Agents.
Abnormality thresholds are high or low thresholds determined by the specified baseline (Hourly, Daily, Weekly, and so on) data collected by Infrastructure Management for an attribute. Abnormality thresholds are enabled by default for Key Performance Indicator (KPI) attributes. Only abnormality thresholds are enabled by default. All other thresholds must be created manually.
If you want to create an abnormality threshold for an attribute that is not a default KPI, you can configure that attribute to be a KPI using the administration console. From the Tools menu, choose KPI Administration and enable the attribute to be a KPI. An above the baseline abnormality threshold is automatically enabled for that attribute. If you want to create a below the baseline abnormality threshold, you must manually create a below the baseline abnormality threshold for that attribute.
If an abnormality threshold is breached, it always generates an event with a severity of Informational in the operator console. You can specify that an event will be generated when the baseline is breached either above or below the baseline.
You can reduce the number of information events generated by an abnormality threshold by specifying any or all of the following conditions when you create the threshold:
Abnormalities are closed automatically when the number of data points exceeded in the last minimum sampling window size is not considered significant. For example, if six out of seven data points out of range are statistically significant, then as soon as the last exceeded points drop to five out of seven points, the abnormality will be closed. By default, even if no explicit global or instance signature threshold is set, Infrastructure Management will generate abnormalities for above baseline conditions.
Signature thresholds are dynamic thresholds that an administrator can set that use the baseline data collected by Infrastructure Management for a KPI attribute as the threshold value. The baseline itself forms the signature threshold. Because signature thresholds use the learned behavior of the service or device, the signature thresholds change as the service or device attributes change. The longer a signature threshold is in place, the more accurate it becomes.
When a signature threshold is breached, it sends an event to the operator console. This event indicates the severity of the threshold breach. An administrator can define the severity levels as Critical, Major, or Minor when the threshold is set. Use signature thresholds if you want to signify that this attribute is so important that if it breaches the baseline that you want to generate an event that is not informational. Signature thresholds are useful for performance attributes such as response time, utilization, and errors.
Absolute thresholds are static thresholds set by an administrator. You can specify that a breach occurs when the data value is greater than, less than, greater than or equal to, less than or equal to, or equal to the specified threshold value. When an absolute threshold is breached, it generates an event in the operator console (Critical, Major, or Minor). You can enable an absolute threshold with any of the baseline types (Hourly, Daily, Weekly, and so on). For example, you can enable an absolute threshold to use the Weekly baseline. An absolute threshold with this configuration would not generate an event until the Weekly baseline had been breached in addition to the absolute threshold value. Combining the absolute threshold with a baselines reduces the number of events received by the Infrastructure Management Console. If the absolute threshold is not enabled with a baseline, then an event is generated whenever the threshold value is breached, regardless of the baseline.
You can reduce the number of events generated by an absolute threshold by specifying any of the following conditions when you create the absolute threshold:
If desired, when you create an absolute threshold, you can specify that you want an early-warning predictive event to be generated 2-3 hours before the absolute threshold is breached. the threshold should be used by Infrastructure Management to predict possible issues. Can warn you 2-3 hours in advance before the absolute threshold is breached. An early-warning predictive event is generated. Disk space slowly fills up over time, the predictive event will give you 2-3 hours lead time before the threshold is breached.
When the administer is creating an absolute threshold, he or she can specify that the events generated by a breach of that threshold can be automatically closed if the data values collected no longer are in breach of the threshold. If Auto Close is not enabled, events generated by a breach of an absolute threshold can be closed manually by a Infrastructure Management Operator in the operator console.
Using Central Monitoring Administration, you can define and manage monitoring policies. A monitoring policy defines a set of selection criteria that is applied to incoming data to determine which data are processed and how the selected data are processed. Monitoring policies are defined in Central Monitoring Administration and are applied to PATROL Agents that match the agent conditions.
A monitoring policy allows you to perform the following tasks:
After the policy has been created, it automatically is applied to all BMC PATROL Agents that match the agent conditions.