The basics: Understanding how events are generated
An event is an occurrence of a change in state on a monitored object. Administrators define the parameters that determine when the state change occurs. The TrueSight Infrastructure Management product provides tools that enable you to generate events so that you can identify problematic devices and critical events and diagnose problems.
Coupled with its rich catalog of monitoring solutions, the TrueSight Infrastructure Management product can use different combinations of key performance indicators, thresholds, and baselines to determine when to send an event to the console. Events can have a severity level of Critical, Major, Minor, or Informational. The severity of the event is determined by the administrator when thresholds are set and should correlate to the severity of the problem or impending problem.
Monitoring Solutions and Knowledge Modules
BMC Software provides many Monitoring Solutions, also known as Knowledge Modules, to help you monitor your environment. A monitoring solution is a pre-defined set of metrics that monitors the health and performance of a specific device or service. BMC monitoring solutions are composed of monitor types and attributes.
- A monitor type is a way of classifying the data that is to be collected.
- Each monitor type has attributes that further classify the monitor type into types of data.
- A monitor instance refers to a monitor type on a specific device.
If you are familiar with PATROL nomenclature, a monitor type is similar to a Knowledge Module application class and an attribute is similar to an application class parameter.
For more information, see Monitoring solutions.
Key Performance Indicators
Key Performance Indicators (KPIs) are attributes that you and BMC Software have determined have the most impact on the health and performance of your environment and your service-level agreements. BMC Software has configured some KPIs by default for each monitoring solution.
Using the operator console, an administrator can specify an attribute as a KPI in addition to the default KPIs specified by BMC Software. To configure an attribute to be a KPI, you must use the administrator console. From the Tools menu, choose KPI Administration and enable the attribute to be a KPI. An above the baseline abnormality threshold is automatically enabled for that attribute.
A baseline represents the normal operating parameters for the device or service that you are monitoring. The baseline is calculated by collecting the values for the attributes of a monitor type over a specified time period and establishing a low baseline value (consisting of the 10th percentile of all the values for a given time period) and a high baseline value (consisting of the 90th percentile of all the values for a given time period), taking a weighted average of these values over time.
A higher weight is given to the latest data factored into the baseline average. Because the accuracy of the baseline improves over time, to establish a reliable baseline, you need to collect data from an attribute for at least a week. The longer the attribute collects data, the more reliable the baseline becomes.
TrueSight Infrastructure Management only generates baselines for attributes that are specified as KPIs. Data will be collected for attributes that are not KPIs, but attributes that are not KPIs will not have baselines.
For detailed information about baselines, see Baselines.
Thresholds define acceptable high and low values for the data collected. You can set thresholds for each attribute for each monitor type, and events are generated when threshold values are breached.
- Global thresholds, established by Technology Specialists or Solution Administrators, are applied to all instances of a monitor type.
- Instance thresholds, set by Technology Specialists, are applied only to a specific instance of a monitor type. Instance thresholds (also known as server-side thresholds) take precedence over global thresholds and are defined in monitoring policies.
TrueSight Infrastructure Management allows you to set abnormality thresholds, signature thresholds, and absolute thresholds. Thresholds can be created as part of a policy that can be applied to multiple monitors types on multiple PATROL Agents.
The approach TrueSight Infrastructure Management takes to detect abnormal behavior differs from the traditional threshold approach. The traditional approach requires the definition of hard thresholds that must be customized for each instance.
This exercise requires precise knowledge of the environment and is not very scalable in terms of administration. The following table describes the types of event thresholds available with TrueSight Infrastructure Management.
For more detailed information about threshold types, see Threshold management using operator console.
Types of event thresholds
Absolute thresholds are static thresholds that represent an absolute value, above or below which an event is generated. In general, an absolute threshold is specified for attributes that have common accepted values beyond which performance is known to degrade. Absolute thresholds are better suited for attributes that change status.
Example: Performance issues can arise when the total CPU utilization of a Solaris system exceeds 80%. In this case, you might specify an absolute threshold of 80% for this attribute.
Signature thresholds are dynamic thresholds that use the baseline as the threshold. Users do not need to set a threshold value, because the baseline is autogenerated. Because of their dynamic nature, signature thresholds provide a much more scalable approach in managing thresholds.
|Abnormality Event threshold
Abnormality thresholds operate in the same way that Signature thresholds work. However, they generate abnormality events utilized by Probable Cause Analysis correlation rather than generate events. Abnormality thresholds are automatically set (out-of-the-box) for all KPI attributes. This is important because users do not need to do anything in order to start seeing the value of the abnormalities in context of Probable Cause Analysis correlation.
For more information about abnormalities, see How abnormalities are generated.
Signature and absolute thresholds can be combined to create an intelligent threshold that generates events when a metric value falls outside its baseline and is above its absolute threshold. Intelligent thresholds help to alleviate issues with absolute values that are set too low for the normal operating environment.
Intelligent thresholds provide the following benefits:
Using the TrueSight console, you can define and manage monitoring policies. A monitoring policy defines a set of selection criteria that is applied to incoming data to determine which data are processed and how the selected data are processed. Among other things, you use monitoring policies to create server-side thresholds. Monitoring policies are applied to PATROL Agents that match the agent conditions.
After the policy has been created, it automatically is applied to all PATROL Agents that match the agent conditions. For more information, see Defining a monitoring policy.