Managing service monitoring rules
If a cloud administrator has enabled service monitoring policies for a service, you can create rules that initiate BMC Cloud Lifecycle Management actions based on service performance breaching defined threshold clauses. This topic discusses:
Threshold clauses
Each rule contains one or more threshold clauses, which include the following information:
- A performance criteria threshold, such as memory utilization above 80%
- The duration the threshold is continually breached, measured in minutes
Rule types
Each rule defines an action to take (either notification or remediation) when the threshold clauses are breached. A notification rule sends an email. A remediation rule makes a change to the service to help resolve a potential problem.
For example, you could create a notification rule that sends you an email alert whenever the memory utilization of a resource set is above 90% for 5 minutes. You could create a remediation rule that adds a server to a resource set whenever system CPU utilization is above 80% for 15 minutes. You can define both a notification action and a remediation action in the same rule.
Precedence
If you have multiple rules defined of a particular type (notification or remediation), rules higher in the list take precedence over rules lower in the list. For example, you might create two rules to improve the performance of a server when CPU utilization is high. One rule might add CPU to the server group when utilization is above 70% for 10 minutes. Another rule might add servers to the server group when utilization is above 90% for 10 minutes. If the second rule has a higher precedence and if CPU utilization jumps from 50% to 95% for 10 minutes, the second rule would be triggered instead of the first rule, even though the conditions for both rules were met. Similarly, if CPU utilization jumps from 50% to 85% for 10 minutes, the first rule would be triggered instead of the second rule, because although the second rule has a higher precedence, its conditions were not met. You could also create two notifications rules that notify different people or groups of people based on different conditions, and give one notification rule a higher precedence.
You can adjust the precedence of your rules by dragging and dropping them in the list.
Waiting periods
As part of the definition of a rule, you define a waiting period that prevents actions from being repeated for a defined length of time. This waiting period ensures that you allow enough time for the requested action to be completed without initiating other potentially conflicting changes. The default waiting period is 15 minutes.
After a remediation rule affecting a target server is triggered, other remediation rules that would affect the server are disabled until the waiting period has elapsed. For example, if you have two remediation rules with different threshold clauses, after one rule is triggered, both rules are disabled until the waiting period of the triggered rule has elapsed. If your defined waiting period is not long enough to allow an action to be completed, BMC Cloud Lifecycle Management attempts to keep other rules inactive beyond the defined waiting period, until the action has completed. A notification rule that is triggered while a remediation is in progress will begin as soon as the rule is triggered.
Recommendations
To minimize the impact of your rules on system performance, create only those rules that you are certain you need.
If you need remediation and notification actions for a service, consider combining those actions in a single rule rather than creating two separate rules. For example, in a single rule, you might define a remediation action that adds a server when CPU utilization is high, and a notification action that alerts you that a server was added to your service instance.
Viewing service monitoring rules
From the service details page, click Monitoring Rules.
Creating a service monitoring notification rule
- On the Monitoring Rules page, click New Rule.
- In the Rule Name field, enter a name for the rule.
- In the WHEN section, define one or more conditions that must be met to trigger an action:
- Select a type of metric.
- Select a comparative symbol (>, >=, <, <=, or =).
- Enter a number for the threshold that triggers an action.
- Enter the number of minutes the threshold must be reached to trigger an action.
- (Optional) To add another condition, click Add Clause, specify whether the new condition is additional (AND) or optional (OR), and then repeat steps a through d.
- In the DO THIS section, select Send notification email.
- In the To field, enter one or more email addresses (separated by commas) for the primary recipients of the notification.
- (Optional) In the CC field, enter one or more email addresses (separated by commas) for the noncritical recipients of the notification.
(Optional) In the BCC field, enter one or more email addresses for any recipients (separated by commas) whose email addresses should be hidden from other recipients.
- In the Subject field, enter text to appear on the Subject line of the email.
- In the Body field, enter text to appear in the body of the email.
- In the Do not repeat action again for at least field, enter the number of minutes the action waits after sending the notification to check whether a new notification should be sent.
- Click OK.
The new rule appears on the Monitoring Rules page. - If necessary, change the precedence of the rule.
- Click and drag the crosshair icon (next to the name of the rule) so that the rule is placed in the correct sequence.
- Repeat for each rule as needed.
- Click Apply Rule Order.
- Click and drag the crosshair icon (next to the name of the rule) so that the rule is placed in the correct sequence.
- When you are ready for the rule to go into effect, click the ON/OFF toggle to ON.
Creating a service monitoring remediation rule
- In the Monitoring Rules page, click New Rule.
- In the Rule Name field, enter a name for the rule.
- In the WHEN section, define one or more conditions that must be met to trigger an action:
- Select a type of metric.
- Select a comparative symbol (>, >=, <, <=, or =).
- Enter a number for the threshold that triggers an action.
- Enter the number of minutes the threshold must be reached to trigger an action.
- (Optional) To add another condition, click Add Clause, specify whether the new condition is additional (AND) or optional (OR), and then repeat steps a through d.
- In the DO THIS section, select Perform an action.
In the drop-down list, select the action you want to perform.
New fields appear depending on the action you want taken, as shown in the following table:Action
Additional fields
Notes
Add Memory Without Server Restart
Amount of memory to add to each server at a time (in MB)
Maximum amount of memory allowed per server (in MB)
This option is available only if it has been enabled by the cloud administrator.
Add CPUs Without Server Restart
Number of CPUs to add to each server at a time
Maximum number of CPUs allowed per server
This option is available only if it has been enabled by the cloud administrator.
Add Memory With Server Restart
Amount of memory to add to each server at a time (in MB)
Maximum amount of memory allowed per server (in MB)
Remove Memory With Server Restart
Amount of memory to remove from each server at a time (in MB)
Minimum amount of memory required per server (in MB)
Add CPUs With Server Restart
Number of CPUs to add to each server at a time
Maximum number of CPUs allowed per server
Remove CPUs With Server Restart
Number of CPUs to remove from each server at a time
Minimum number of CPUs required per server
Add Servers
Number of servers to add at a time
Maximum number of servers allowed
Username
Password
Confirm Password
Include script
Type (Windows .bat file or Shell script)
Script
Input Parameters
For the Username and Password fields, enter authentication information for the server being added. If you are unsure about what to enter, contact your system administrator.
The number of servers refers to the servers in this resource set.
The Type, Script, and Input Parameters fields appear only if you select the Include script check box. Scripts are useful for postprovisioning actions. For example, if a Tomcat Web server is overloaded, and you use a policy to add another, you could use a script to deploy a Web application archive (WAR) file on the server before starting the server.
Ensure that you test and validate the syntax of any script you include in the action. If you include parameters in your script, provide the input values for those parameters in the Input Parameters field.
Remove Servers
Number of servers to remove at a time
Minimum number of servers required
Username
Password
Confirm Password
Include script
Type (Windows .bat file or Shell script)
Script
Input Parameters
For the Username and Password fields, enter authentication information for the server being added. If you are unsure about what to enter, contact your system administrator.
The number of servers refers to the servers in this resource set.
The Type, Script, and Input Parameters fields appear only if you select the Include script check box.
Ensure that you test and validate the syntax of any script you include in the action. If you include parameters in your script, provide the input values for those parameters in the Input Parameters field.
Custom
Defined by the cloud administrator
Custom actions are available only if the cloud administrator defines and enables them.
In the Do not repeat action again for at least field, enter the number of minutes the action waits after sending the notification to check whether a new notification should be sent.
- Click OK.
The new rule appears on the Monitoring Rules page. - If necessary, change the precedence of the rule.
- Click and drag the crosshair icon (next to the name of the rule) so that the rule is placed in the correct sequence.
- Repeat for each rule as needed.
- Click Apply Rule Order.
- Click and drag the crosshair icon (next to the name of the rule) so that the rule is placed in the correct sequence.
- When you are ready for the rule to go into effect, click the ON/OFF toggle to ON.