Unsupported content This version of the product has reached end of support. The documentation is available for your convenience. However, you must be logged in to access it. You will not be able to leave comments.

Managing service monitoring rules


If a cloud administrator has enabled service monitoring policies for a service, you can create rules that initiate BMC Cloud Lifecycle Management actions based on service performance breaching defined threshold clauses. This topic discusses:

Threshold clauses

Each rule contains one or more threshold clauses, which include the following information:

  • A performance criteria threshold, such as memory utilization above 80%
  • The duration the threshold is continually breached, measured in minutes

Rule types

Each rule defines an action to take (either notification or remediation) when the threshold clauses are breached. A notification rule sends an email. A remediation rule makes a change to the service to help resolve a potential problem.

For example, you could create a notification rule that sends you an email alert whenever the memory utilization of a resource set is above 90% for 5 minutes. You could create a remediation rule that adds a server to a resource set whenever system CPU utilization is above 80% for 15 minutes. You can define both a notification action and a remediation action in the same rule.

Precedence

If you have multiple rules defined of a particular type (notification or remediation), rules higher in the list take precedence over rules lower in the list. For example, you might create two rules to improve the performance of a server when CPU utilization is high. One rule might add CPU to the server group when utilization is above 70% for 10 minutes. Another rule might add servers to the server group when utilization is above 90% for 10 minutes. If the second rule has a higher precedence and if CPU utilization jumps from 50% to 95% for 10 minutes, the second rule would be triggered instead of the first rule, even though the conditions for both rules were met. Similarly, if CPU utilization jumps from 50% to 85% for 10 minutes, the first rule would be triggered instead of the second rule, because although the second rule has a higher precedence, its conditions were not met. You could also create two notifications rules that notify different people or groups of people based on different conditions, and give one notification rule a higher precedence.

You can adjust the precedence of your rules by dragging and dropping them in the list.

Waiting periods

As part of the definition of a rule, you define a waiting period that prevents actions from being repeated for a defined length of time. This waiting period ensures that you allow enough time for the requested action to be completed without initiating other potentially conflicting changes. The default waiting period is 15 minutes.

After a remediation rule affecting a target server is triggered, other remediation rules that would affect the server are disabled until the waiting period has elapsed. For example, if you have two remediation rules with different threshold clauses, after one rule is triggered, both rules are disabled until the waiting period of the triggered rule has elapsed. If your defined waiting period is not long enough to allow an action to be completed, BMC Cloud Lifecycle Management attempts to keep other rules inactive beyond the defined waiting period, until the action has completed. A notification rule that is triggered while a remediation is in progress will begin as soon as the rule is triggered.

Recommendations

To minimize the impact of your rules on system performance, create only those rules that you are certain you need.

If you need remediation and notification actions for a service, consider combining those actions in a single rule rather than creating two separate rules. For example, in a single rule, you might define a remediation action that adds a server when CPU utilization is high, and a notification action that alerts you that a server was added to your service instance.

Viewing service monitoring rules

From the service details page, click Monitoring Rules.

MonitoringRules.png

Creating a service monitoring notification rule

  1. On the Monitoring Rules page, click New Rule.
  2. In the Rule Name field, enter a name for the rule.
  3. In the WHEN section, define one or more conditions that must be met to trigger an action:
    1. Select a type of metric.
    2. Select a comparative symbol (>, >=, <, <=, or =).
    3. Enter a number for the threshold that triggers an action.
    4. Enter the number of minutes the threshold must be reached to trigger an action.
    5. (Optional) To add another condition, click Add Clause, specify whether the new condition is additional (AND) or optional (OR), and then repeat steps a through d.
  4. In the DO THIS section, select Send notification email.
    NotificationAction.png
  5. In the To field, enter one or more email addresses (separated by commas) for the primary recipients of the notification.
  6. (Optional) In the CC field, enter one or more email addresses (separated by commas) for the noncritical recipients of the notification.
  7. (Optional) In the BCC field, enter one or more email addresses for any recipients (separated by commas) whose email addresses should be hidden from other recipients.

    Note

    When you configure email notifications for receiving emails for SOI provisioning failures, the configuration is saved even if you do not specify email addresses in the To, CC, and BCC fields.

  8. In the Subject field, enter text to appear on the Subject line of the email.
  9. In the Body field, enter text to appear in the body of the email.
  10. In the Do not repeat action again for at least field, enter the number of minutes the action waits after sending the notification to check whether a new notification should be sent.
  11. Click OK.
    The new rule appears on the Monitoring Rules page.
  12. If necessary, change the precedence of the rule.
    1. Click and drag the crosshair icon (next to the name of the rule) so that the rule is placed in the correct sequence.
      crosshairs.png 
    2. Repeat for each rule as needed.
    3. Click Apply Rule Order.
  13. When you are ready for the rule to go into effect, click the ON/OFF toggle to ON.

Creating a service monitoring remediation rule

Note

With Amazon Web Services, adding or removing CPUs or memory impacts the instance type of your server. Because the available instance types are fixed, changing CPU or memory might result in unexpected changes. For example, if your server instance has two CPUs and 2 GB of memory, and the next available instance with more memory is a server with four CPUs and 4 GB of memory, requesting an additional 1 GB of memory actually increases the amount of memory by 2 GB, and also increases the number of CPUs from two to four. For more information about Amazon Web Servers instances, see the Amazon Web Services documentation.

  1. In the Monitoring Rules page, click New Rule.
  2. In the Rule Name field, enter a name for the rule.
  3. In the WHEN section, define one or more conditions that must be met to trigger an action:
    1. Select a type of metric.
    2. Select a comparative symbol (>, >=, <, <=, or =).
    3. Enter a number for the threshold that triggers an action.
    4. Enter the number of minutes the threshold must be reached to trigger an action.
    5. (Optional) To add another condition, click Add Clause, specify whether the new condition is additional (AND) or optional (OR), and then repeat steps a through d.
  4. In the DO THIS section, select Perform an action.
    RemediateAction.png
  5. In the drop-down list, select the action you want to perform.
    New fields appear depending on the action you want taken, as shown in the following table:

    Action

    Additional fields

    Notes

    Add Memory Without Server Restart

    Amount of memory to add to each server at a time (in MB)

    Maximum amount of memory allowed per server (in MB)

    This option is available only if it has been enabled by the cloud administrator.

    Add CPUs Without Server Restart

    Number of CPUs to add to each server at a time

    Maximum number of CPUs allowed per server

    This option is available only if it has been enabled by the cloud administrator.

    Add Memory With Server Restart

    Amount of memory to add to each server at a time (in MB)

    Maximum amount of memory allowed per server (in MB)

     

    Remove Memory With Server Restart

    Amount of memory to remove from each server at a time (in MB)

    Minimum amount of memory required per server (in MB)

     

    Add CPUs With Server Restart

    Number of CPUs to add to each server at a time

    Maximum number of CPUs allowed per server

     

    Remove CPUs With Server Restart

    Number of CPUs to remove from each server at a time

    Minimum number of CPUs required per server

     

    Add Servers

    Number of servers to add at a time

    Maximum number of servers allowed

    Username

    Password

    Confirm Password

    Include script

    Type (Windows .bat file or Shell script)

    Script

    Input Parameters

    For the Username and Password fields, enter authentication information for the server being added. If you are unsure about what to enter, contact your system administrator.

    The number of servers refers to the servers in this resource set.

    The Type, Script, and Input Parameters fields appear only if you select the Include script check box. Scripts are useful for postprovisioning actions. For example, if a Tomcat Web server is overloaded, and you use a policy to add another, you could use a script to deploy a Web application archive (WAR) file on the server before starting the server.

    Ensure that you test and validate the syntax of any script you include in the action. If you include parameters in your script, provide the input values for those parameters in the Input Parameters field.

    Remove Servers

    Number of servers to remove at a time

    Minimum number of servers required

    Username

    Password

    Confirm Password

    Include script

    Type (Windows .bat file or Shell script)

    Script

    Input Parameters

    For the Username and Password fields, enter authentication information for the server being added. If you are unsure about what to enter, contact your system administrator.

    The number of servers refers to the servers in this resource set.

    The Type, Script, and Input Parameters fields appear only if you select the Include script check box.

    Ensure that you test and validate the syntax of any script you include in the action. If you include parameters in your script, provide the input values for those parameters in the Input Parameters field.

    Custom

    Defined by the cloud administrator

    Custom actions are available only if the cloud administrator defines and enables them.

  6. In the Do not repeat action again for at least field, enter the number of minutes the action waits after sending the notification to check whether a new notification should be sent.

    Note

    If an action is triggered, and the waiting time is reached while the remediation action is still in progress, another remediation action is queued and will start after remediation has completed. To prevent this situation, ensure that the waiting time you specify is long enough for the remediation action to be completed.

  7. Click OK.
    The new rule appears on the Monitoring Rules page.
  8. If necessary, change the precedence of the rule.
    1. Click and drag the crosshair icon (next to the name of the rule) so that the rule is placed in the correct sequence.
      crosshairs.png 
    2. Repeat for each rule as needed.
    3. Click Apply Rule Order.
  9. When you are ready for the rule to go into effect, click the ON/OFF toggle to ON.

Related topics

Requesting-cloud-services

Monitoring-service-performance