Threshold management using operator console

Infrastructure Management thresholds are those elements that generate events based on the infrastructure performance data. This topic describes the various types of thresholds, their interaction with the Infrastructure Management environment, its usages, and some management practices. 

Types of thresholds

There are three types of thresholds that are native to the TrueSight Infrastructure Management Server: Absolute, Signature, and Abnormality thresholds. In addition, Infrastructure Management PATROL KMs have their own thresholds, which are not directly managed by the TrueSight Infrastructure Management Server, but which are capable of generating events that integrate with TrueSight Infrastructure Management Server Analytics.

Note

The term Intelligent Threshold is used liberally throughout product documentation. It simply means the usage of baselines in any of the three native Infrastructure Management threshold types.

Absolute thresholds

Absolute thresholds are simple and static sets of thresholds that represent absolute values above or below which an event is generated. In general, you specify absolute thresholds attributes that have commonly-accepted values beyond which performance is known to degrade. Absolute thresholds are better suited for attributes that change status.

For example, when the total CPU utilization of a Solaris system exceeds 80%, performance issues can occur. In this case, you would specify an absolute threshold of 80% for this attribute.

Note

A trigger condition is defined to be:

  1. Having data points that satisfy a defined threshold value, which is the maximum of:
    • Threshold value
    • Baseline with applicable padding factors
  2. A duration (0 for immediate) during which the above condition remains true.

Absolute thresholds' event-triggering behaviors are easier to grasp because of the requirement that all performance data points must exceed the trigger condition for a specified duration before an event gets created. The absence of the trigger condition for the same duration will close the event.

Having the baseline in the trigger condition allows a threshold to generate events based on learned behavior.

The following figure illustrates a typical trigger condition for an absolute threshold.

Absolute thresholds with outside baseline enabled

Absolute thresholds with outside baseline are thresholds which combine both absolute thresholds (static threshold value) and dynamic thresholds (which use the baseline) as the threshold. The user can still set the threshold value  with but as additional criteria for alarm generation.

Advantage:
  • Reduce the number of false alarms when compared with using static threshold. 
  • Absolute thresholds with outside baseline allows more control than signature thresholds.  

  • Absolutes with no baselines are for the availability metrics, or other metrics that have discrete, state based values; that is, 0=OK, 1=Down, 2=Admin Down.
  • Absolutes with baselines are for user influenced performance metrics with normalized values (0-100) or with set definitions of values for CPU utilization, memory utilization, process utilization, and so on.

Signature thresholds

Where Absolute thresholds tend to fail to create events are scenarios where the performance data take on a more transient behavior, rendering the trigger condition less obvious. Signature thresholds can be utilized here because it can make a judgment when enough data points are needed to trigger an event, as illustrated below.

Due to the non-deterministic nature of the performance data, it might take longer than anticipated for a Signature threshold to generate an event.

Note

The difference between Absolute and Signature thresholds is in the percentage of data points that must meet the trigger condition. 100% for Absolute thresholds and just enough for Signature thresholds.

Signature threshold for performance metrics that have no set concept of value is completely dependent on the attribute and the instances (transaction response time, Ping Response time, etc) being monitored.

Example

For example, the ping response time is 900ms. The Signature threshold is considered poor if this response happens in a setup within the same data center having a gigabit switch, but it is considered good if it happens in a setup that extends across the continent.

Abnormality thresholds

Absolute and Signature thresholds generate actionable events for those events with severity of MINOR, MAJOR, or CRITICAL, Abnormality thresholds create events that are meant to be invisible and labeled with severity INFO. Otherwise, Abnormality thresholds behave identically to Signature thresholds.

PATROL thresholds

These are thresholds that are implemented by the PATROL KMs directly. Events get generated by the KMs and get forwarded directly to the BMC TrueSight Infrastructure Management Server.

Internal events and their life cycles

Events that are generated by the TrueSight Infrastructure Management Server (by the three native thresholds) are called Internal events. They are marked with icons having a double wrench. The key characteristics of Internal events are:

  • Thresholds of the same type (Absolute, Signature, or Abnormality) for a metric of an instance operate on the same event, A.
  • Event A has a long life span, maintaining the same mc_ueid.
  • Besides creations and closures, state changes such as severity demotion/promotion are also legitimate changes for event A.

Example

Define three thresholds for the attribute named Response Time.

  1. Response Time > 5 sec : MINOR

  2. Response Time > 10 sec : MAJOR

  3. Response Time > 20 sec : CRITICAL

Over time, all three thresholds trigger one after the other with their respective severity, in the order of increasing severity (MINOR > MAJOR > CRITICAL) and then, they release in decreasing order of severity. The sequence of action over time is:

  1. Response Time = 6. Threshold A creates event with mc_ueid, dev1-alr-3322 with MINOR severity.

  2. Response Time = 11. Threshold B modifies dev1-alr-3322 to severity MAJOR.

  3. Response Time = 25. Threshold C modifies dev1-alr-3322 to severity CRITICAL.

  4. Response Time = 12. Threshold C closes. Threshold B modifies dev1-alr-3322 back to MAJOR.

  5. Response Time = 8. Threshold B closes. Threshold A modifies dev1-alr-3322 back to MINOR.

  6. Response Time = 1. Thresholds A closes. Severity left unchanged.

Note

All events that are not generated by the BMC TrueSight Infrastructure Management Server native thresholds are considered external and marked with icons having a single wrench.

When to use thresholds

Absolute thresholdsMetrics that show unambiguous behavior such as up/down, availability, or capacity violation are good candidates for Absolute thresholds.
Signature thresholdsSignature thresholds are more appropriate for metrics that are more transient, such as response time, packet per second, and so on, where the data can exhibit big swings against a generally upward/downward trend.
PATROL thresholdsPATROL thresholds are appropriate to use for scenarios that are clear-cut, such as device availability, critical capacity overloads, and so on, which do not need to make use of advanced, server-side analytic capabilities. PATROL thresholds have the advantage of quickly triggering actionable events without having to wait for the data to be collected and passed along to the BMC TrueSight Infrastructure Management Server.
Abnormality thresholds

Abnormality thresholds generate informational events when key metrics go into exceptional states. These events become useful during troubleshooting scenarios using Probably Cause Analysis (PCA), but are otherwise ignored.

By default, you do not have to do any customization for Abnormality thresholds since they have already been created for all KPIs. However, if you customize the KPI list, ensure that you create new Abnormality thresholds.

KPIs, baselines, and thresholds

Baseline generation requirements

Key Performance Indicators (KPIs) are essential metrics for monitoring an infrastructure. They have a direct impact on whether or not baseline computation takes place for corresponding metrics. The following figures show how KPIs may affect baseline generation where the checked boxes indicate that baseline generation gets carried out for those combinations.

When KPI mode is active (Baseline only for KPI) – Infrastructure Management 8.6, 9.0

 

KPI Attributes

Non-KPI Attributes

Have Abnormality Thresholds

(tick)

(error)

No Abnormality Thresholds

(tick)

(error)

When KPI mode is not active (Baseline for all metrics) – Infrastructure Management 8.6, 9.0

 

KPI Attributes

Non-KPI Attributes

Have Abnormality Thresholds

(tick)

(tick)

No Abnormality Thresholds

(error)

(error)

In order to function correctly, Abnormality and Signature thresholds require baseline data. Due to this requirement, you may face support issues as certain thresholds would not work. In such cases, ensure that you verify that the baseline is being generated for the metrics in question.

Which Baseline to Use

The TrueSight Infrastructure Management Server automatically computes three different types of baselines (Hourly, Daily, and Weekly) to be used by the thresholds. In most cases, when defining thresholds, it is adequate to use Auto Baseline, where the TrueSight Infrastructure Management Server determines the best type of baseline to use for any given metric.

However, if it is known that certain metrics have clear, repeatable hourly patterns (for example, 10 AM on Tuesday behaves in the same way as 10 AM on Wednesday), then you can select Hourly Baseline as the baseline type to use for those corresponding thresholds. Similarly, Daily and Weekly baselines can be used by thresholds if you know that their metrics behave accordingly.

Seasonality baselines

This feature is useful for infrastructures that have recurring periods where (part of) the infrastructure behaves very differently and that they do not want these behaviors to be factored into the normal baselines. Example: Cases where you have a major back-up on the last Friday of every month, financial number crunching at the end of every quarter, and so on.

In order for Seasonality baselines to work properly, the Infrastructure Management administrator must ensure that the baseline retention period properly reflects the special recurring period. For example, if the recurring period is twelve-month long, the baseline retention period has to be just as long.

Note

Contact BMC Customer Support when retention periods are extended, as they can severely degrade Infrastructure Management's system performance.

How data affect thresholds

Thresholds are data-driven – the more available data points, the sooner thresholds can generate some events, especially those that make use of baselines. However, frequent polling intervals will increase the TrueSight Infrastructure Management Server's system load.

Advanced Signature threshold configuration

When creating a Signature threshold, it is desirable to fine-tune the behavior of the threshold. As shown in the following image, there are four additional fields that become visible when you select the advanced view when creating Signature thresholds.

The field descriptions are:

Field Name

Description and usage

Minimum Sampling Window

The minimum span of time, as marked by collected data points, required in order for the Signature threshold engine to initiate evaluation.

Threshold

Specified if you do not want an algorithm to trigger on trivial conditions. For example, if the baseline is low (around 3% - 5%), specify a high threshold value so that Signature thresholds will only trigger if data values are higher than 80% and surpassing baseline.

Deviation

Use this section to expand the baseline value range; typically, to reduce the sensitivity of the Signature threshold.

Prediction

The Prediction feature gives early warnings of certain exceptional situations and is used to issue warnings if there is an aggressive trend towards the threshold. While it can be used in a wide number of scenarios, it is most effective in capacity-type scenarios, especially for those metrics which exhibit clear hourly patterns.

Some of the points to remember when using this feature include:

  • Only applicable to Absolute thresholds
  • Baseline is required
  • Will need to increase polling interval - easier and more timely to establish trends
  • Will use up more run-time resources. Have to examine load if large number of thresholds enables Prediction

How to Persist Thresholds

Use the pw threshold checkpoint command to save states of threshold while customizing deployment.

Was this page helpful? Yes No Submitting... Thank you

Comments