Constructing alarm generation conditions


As a tenant administrator, you must construct alarm generation conditions while creating alarm policies. The alarm generation conditions will determine the thresholds for the alarms to be generated. 

You can define the alarm generation conditions for multiple or all instances. The following image displays the configurations that you need to specify while constructing alarm generation conditions:

Start adding a condition by specifying these details:

Alarm generation conditions_severity_color.png

The following table describes the configurations:




Examples

Example: Condition based on instance name

According to the condition defined in the following image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), when the CPU Utilization of all Linux computers whose instance name starts with 'clm' crosses the threshold of 75% for a period of 15 minutes, a Major alarm will be generated. After the Utilization returns to a normal state of below 75% and a 15 minutes lapse, the generated alarm is automatically closed.

Alarm generation condition_severity_color.png

You can add multiple conditions per metric. To add additional conditions, click Add Condition. Click Duplicate Condition to create another condition with the same details that you can modify later. To add conditions for a different metric, click Add Instance Policy.

If you have multiple conditions with varying threshold values and severity levels, an alarm is generated when the first condition is breached. When the next condition is breached, a new alarm is not generated. Instead, the severity of the first alarm changes.

Suppose you added these conditions:

  • If CPU utilization crosses 75% for a period of 15 minutes, generate a Major alarm.
  • If CPU utilization crosses 85% for a period of 15 minutes, generate a Critical alarm.

In this scenario, when the CPU utilization of a computer crosses 75% for a period of 15 minutes, a Major alarm is generated. When the CPU utilization of the same computer crosses 85%, the earlier alarm severity changes from Major to Critical. If the CPU utilization returns to 75%, the alarm severity changes from Critical to Major.

Example: Condition based on agent tag and host name

According to the condition defined in the following image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), when the CPU Utilization of all Linux computers whose agent tag name is production and agent host name begins with clm crosses the threshold of 75% for a period of 15 minutes, a Major alarm will be generated. After the Utilization returns to a normal state of below 75% and a 15 minutes lapse, the generated alarm is automatically closed.

agent_tag_alarm_severity_color.png

Example: Condition based on regular expressions

The following example uses a regular expression while defining multiple instances. According to the condition defined in the following image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), when the CPU Utilization of all Linux computers whose agent host name matches the regular expression AM-.* and instance name matches the regular expression [^0].*, crosses the threshold of 75% for 15 minutes, a Major alarm will be generated. After the Utilization returns to a normal state of below 75% and a 15 minutes lapse, the generated alarm is automatically closed. The following example explains how the regular expressions defined in the following image helps you to filter the instances based on agent host names and instance names.

  • Agent host names

    • hostAM-34TEST1
    • AM-clm-pun
    • AM-34TEST1
    • PSRTESTserver

    According to the agent host name regular expression AM-.*, only 2 host names (AM-clm-pun and AM-34TEST1) are selected from the preceding list.

  • Instance names

    • 04578
    • 0prodtest
    • qa0test
    • q10000

    According to the instance name regular expression [^0].*, only 2 instance names (qa0test and q10000) are selected from the preceding list.

alarmPolicy_regex_severity_color.png

Example: Condition based on threshold and baseline violation

According to the condition defined in the following image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), when the CPU Utilization of all Linux computers whose instance name starts with 'CPU1' crosses the threshold of 75% for a period of 15 minutes, and it violates the baseline determined for that event, a Major alarm will be generated. After the Utilization returns to a normal state of below 75% and a 15 minutes lapse, the generated alarm is automatically closed.

Static and baseline violation_severity_color.png

Example: Condition based on only baseline violation

According to the condition defined in the following image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), when the CPU Utilization of all Linux servers violates the metric baseline value for 15 minutes, an Information event is generated. After the Utilization returns to a normal state, the generated information event is automatically closed.

Baseline violation_severity_color.png

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*