Default language.

Configuring alarm policies


As a tenant administrator, use alarm policies to safeguard against abnormalities by configuring thresholds. Thresholds define an acceptable value above or below which an alarm is generated. 

You can manually configure alarm policies or configure automatic anomaly event generation to detect anomalies. Alarms are events generated when threshold values are violated. You can configure the threshold values. Thresholds define an acceptable value above, below, or within a specified region when an alarm is generated. Thresholds can be applied to multiple monitor types on multiple PATROL Agents.

You can perform the following operations on alarm policies:

  • Create, edit, view, and delete alarm policies.
  • Export all alarm policy data to a CSV file.
  • View a list of alarm policy conflicts, where alarm policies with the same severity are defined for the same metric.
  • View the audit trail of all updates made to all alarm policies.
Warning

Important

Make sure that the monitor.external_entity_types.view permission is assigned to a custom restricted-user role, so that the associated users can view external entities while configuring alarm policies.

Scenario 

Sarah, an administrator at Apex Global, wants to create an alarm policy that will send her an alarm event if the CPU utilization on Linux computers reaches above 85%.

Watch the following video (2:02) to see how Sarah creates the alarm policy.

To create an alarm policy

If you are creating an alarm policy for dynamic monitor types, make sure that monitor instances are available for a device in the Devices > Device Details > Monitors tab.

  1. Select the Configuration > Alarm Policies page and click Create.
  2. In the Policy Information section, enter a Name and Description for the policy.
  3. In the Precedence field, add a unique precedence number to the policy.
    You can add a custom value in this field, or use the arrows to increase or decrease the value. For more information, see Alarm policies.
  4. Use the Alarm Genaration Conditions section to create the conditions based on which the alarm will be generated.
    Alarm Generation annotated.png
    Perform the following steps to create the conditions:
    1. In the For field, click find_button to select a monitor type. 
      You can select one of the following monitor types:
    2. In the On field, select one of the following options:
      • All Instances
      • Multi Instances: For multiple instances, you can define multiple conditions using parameters such as agent tag name, hostname, instance name, port, etc. You can also use regular expressions while defining multiple instances by using the Matches operator.
        You cannot create multiple alarm polices with the same metrics and conditions. You can create policies with the same metrics and conditions only if the instance type is different. The instance type can be All or Multiple.
        If you create multiple alarm polices with the same metrics, same conditions, and different instance types, the policy with the Multiple instance type takes precedence.
    3. In the next field, add the instance details.
      You can also copy the criteria by clicking Copy copy_button. The copied criteria can be reused in subsequent policies by pressing Ctrl+V in the selection criteria field.
      Important:
      • The instance name is case sensitive. Ensure that you use the instance name as displayed in the Device Details page while defining the selection criteria.
      • The instance name that you enter in the alarm policy is displayed in the object slot for an alarm event and not in the instancename slot on the Events page.
    4. (Optional) Click preview_button to view the devices that satisfy the selection criteria.
      Important:
      The Preview button is disabled if you specify the Device Host Name and Instance Name in the selection criteria.

    5. Use the If and Then fields to add the baseline condition, threshold value, violation duration, and details about when the generated alarm must be closed eventually.
    6. (Optional) Use the Add Condition option to add another condition for the same metric, or use the Clone Condition option to clone and customize the current condition.

      Important: If you have multiple conditions with varying threshold values and severity levels, an alarm is generated when the first condition is breached. When the next condition is breached, a new alarm is not generated. Instead, the severity of the first alarm changes.

      For example, if the condition is set as 

      If CPU utilization exceeds 75% for 15 minutes, generate a Major alarm. If CPU utilization exceeds 85% for 15 minutes, generate a Critical alarm.

      In this scenario, when the CPU utilization for the system crosses 75% for a period of 15 minutes, a Major alarm is generated. When the CPU utilization of the same system crosses 85%, the earlier alarm severity changes from Major to Critical. If the CPU utilization returns to 75%, the alarm severity changes from Critical to Major.

    7. (Optional) Use the Add New Metric option to add an alarm generation condition for a different metric.
  5. (Optional) Select Enable Policy.
    You can enable or disable the policy at any time from the Alarm Policies page.
  6. Save the policy.

Examples for defining conditions

Example 1: CPU utilization threshold alarm for Linux instances using Instance Name prefix

As per the condition defined in the following image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), when the CPU Utilization of all Linux computers whose instance name starts with 'clm' crosses the threshold of 75% for a period of 15 minutes, a Major alarm will be generated. When Utilization falls below 75% and remains there for 15 minutes, the generated alarm will be automatically closed.

example 1

Example 2: CPU utilization threshold alarm using Agent Tag and Host Name-based filtering

As per the condition defined in the following image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), when the CPU Utilization of all Linux computers whose agent tag name is production and agent host name begins with clm crosses the threshold of 75% for a period of 15 minutes, a Major alarm will be generated. Once Utilization falls below 75% and remains there for 15 minutes, the generated alarm is automatically closed.

Example_2_screenshot

Example 3: CPU utilization threshold alarm using regular expression–based on matching Host Name and Instance Name

As per the condition defined in the following image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), when the CPU Utilization of all Linux computers whose agent host name matches the regular expression AM-.* and instance name matches the regular expression ^0.*, crosses the threshold of 75% for 15 minutes, a Major alarm will be generated. When the Utilization drops below 75% and remains there for 15 minutes, the generated alarm will automatically close. The following example explains how the regular expressions shown in the image below help you to filter instances based on agent host names and instance names.

Example_3_screenshot

example 4: Baseline-aware CPU utilization threshold alarm for specific Linux CPU Instances

As per the condition defined in the following image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), when the CPU Utilization of all Linux computers whose instance name starts with 'CPU1' crosses the threshold of 75% for a period of 15 minutes, and it violates the baseline determined for that event, a Major alarm will be generated. Once the Utilization falls below 75% and remains there for 15 minutes, the generated alarm is automatically closed.

Violates baselines.png

Example 5: CPU utilization threshold alarm using In‑Range condition with High Baseline Violation

As per the condition defined in the image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), an alarm is generated when the metric value is greater than or equal to the start range and less than or equal to the end range, and the condition is met for the configured duration.

If the start range is 50% and the end range is 70%:

  • When CPU utilization is between 50% and 70%, the alarm is generated.
  • When CPU utilization is 40%, no alarm is generated.
  • When CPU utilization is 80%, no alarm is generated.

This condition applies to Linux computers whose:

  • agent host name matches the regular expression AM-.*, and
  • instance name matches the regular expression ^0.*.

When the CPU utilization remains continuously in the 50% to 70% range for 15 minutes and exceeds the high baseline, a Minor alarm is generated.
If the utilization moves outside the specified range and remains outside for 15 minutes, the generated alarm is automatically closed.

(%style=

Example 6: CPU utilization threshold alarm using Out‑Range condition with Low Baseline Violation

As per the condition defined in the image (Monitoring solution: Linux; Monitor type: CPU; Metric: Utilization), an alarm is generated when the metric value is less than the start range or greater than the end range, and the condition is met for the configured duration.
​​
If the start range is 50% and the end range is 70%:

  • When CPU utilization is less than 50% or greater than 70%, the alarm is generated.
  • When CPU utilization is between 50% and 70%, no alarm is generated.

When the CPU utilization remains outside the 50% to 70% range continuously for 15 minutes and violates the low baseline, a Major alarm is generated.
If the utilization returns to the range between 50% and 70% and remains there for 15 minutes, the generated alarm is automatically closed.

Example_6_screenshot

To edit an alarm policy

  1. Select the Configuration > Alarm Policies page and do one of the following:
    • Select the policy and click Edit.
    • From the Actions menu of a policy, select Edit.
  2. Change the configuration details provided while creating the policy and click Save.

When you edit an alarm policy, the system incrementally loads 25 threshold values at once on a page scroll to improve usability.

If you update the precedence of existing policies, the system automatically closes open alarms associated with the policy and generates new alarms for any subsequent metric threshold violations for that policy.

You cannot search thresholds that are not yet loaded on the page.

To export alarm policies to a CSV file

  1. Make sure that you have the permission to read alarm policies.
  2. Select Configuration > Alarm Policies.
  3. Click the Download icon download_icon.png.
  4. Click Alarm policies report.
    The data of all alarm policies is downloaded in a CSV file. You can save the file on your local computer. 

To view conflicting alarm policies

  1. Select Configuration > Alarm Policies.
  2. Click the Download icon download_icon.png.
  3. Click Policy conflicts report.
    A policy conflict report that contains the list of conflicting alarm policies is downloaded as a PDF file.
Warning

Important

If your language is set to Chinese (Simplified), use web browsers such as Microsoft Edge, Mozilla Firefox, or Google Chrome to view the PDF. Do not use Adobe Acrobat Reader to view the PDF. You can also use any other PDF readers.

The policy conflict report might take some time to be generated. Other functions on the user interface are not impacted if the PDF file download is in progress.

To copy an alarm policy

  1. Select the Configuration > Alarm Policies.
  2. Click the action menu of the policy that you want to copy and select Copy
    The Create Alarm Policy page is displayed with the configurations of the copied policy. 
  3. Modify the configurations according to your requirements to create a new policy quickly. 
  4. (Optional) Select Enable Policy.
    You can enable or disable the policy any time from the Alarm Policies page.
  5. Save the policy.

To view the list of alarm policies

On the Configuration > Alarm Policies page, view the list of alarm policies.

By default, the policies are sorted by Name. To sort on a different column, click the column heading.

To enable or disable an alarm policy

On the Configuration > Alarm Policies page, do one of the following:

  • Select the policy and click Enable or Disable.
  • From the Actions menu of a policy, select Enable or Disable.
  • Edit the policy and select or clear the Enable Policy check box.
Information
What happens if I enable or disable a policy?
  • What happens when I disable an enabled policy?
    All alarms that are created based on the policy are closed.
  • What happens when I enable a disabled policy?
    New alarms are created based on the policy criteria with new event IDs.

To delete an alarm policy

On the Configuration > Alarm Policies page, do one of the following:

  • Select one or more policies and click Delete.
  • From the Actions menu of a policy, select Delete, and click Yes.

In the policy post-trigger actions, if you have specified that the alarm must not be closed, such alarm events are automatically closed if they are not updated for more than the period specified in the retention policy. All closed events are automatically deleted from the system as per the retention policy. However, if you delete such a policy or the PATROL Agent associated with such an alarm is deleted, the alarm is automatically closed.

For more information about the retention policy, see BMC Helix Operations Management service.

Warning

Important

Deleting a large number of policies is a maintenance activity and should be done in a controlled manner. Conctact BMC Support if you want to delete a large number of policies at once.Type your warning message here.

To view the audit trail of alarm policies

As a tenant administrator, you can use the BMC Helix Audit dashboard in BMC Helix Dashboards, to view the trail of all changes that were made to alarm policies. The BMC Helix Audit Dashboard provides the audit trail of alarm policies. 

Scenario

Tina is a tenant administrator and Sarah is a system administrator at Apex Global. Tina has left on a vacation and she won't be back at work for two more weeks. Sarah has taken up some of Tina's responsibilities during this time. Sarah is looking at some alarm policies in the system and she wants to know when Tina created them and when they were updated. Because Tina is on vacation, how can Sarah obtain this information?

Sarah can log in to BMC Helix Dashboards and use the BMC Helix Audit Dashboard to see a complete audit trail of all alarm policies.

For more information, see BMC Helix Audit Dashboard.

The following image displays the audit trail of alarm policies in the BMC Helix Portal Audit dashboard. Note that the selected resource type is Alarm Policy.

audit_trail_dashboard.png

Related topics

Alarm policy management endpoints in the REST API

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix Operations Management 26.2