Health score computation by impact severity
When BMC Helix AIOps computes the health score of a service by impact severity, it considers the severity of the impact to determine the health score of a service. The score of the node with the highest impact (lowest score) is considered the service's health score.
Service health score computation process
The following process is used to compute the health score of a service:
- Node health score is computed based on the following factors:
- Events impacting the node
- Event rules
- Health indicators
- Service health score is the lowest node score among all the nodes.
Because the health score of a service depends on the health score of its nodes, let's first look at how the health score of a node is calculated.
Node health score computation
The health score for a node is computed by using causal events that impact the node.
Node health score computation without any event rules or health indicator events
By default, the health score of a node is 100. Each event severity is assigned a score as listed in the following table:
Event Severity | Score | Reduction in health score |
|---|---|---|
Critical | 10 | If the node is impacted by one critical event, its health score is reduced by 10. |
Major | 8 | If the node is impacted by one major event, its health score is reduced by 8. |
Minor | 6 | If the node is impacted by one minor event, its health score is reduced by 6. |
Warning | 4 | If the node is impacted by one warning event, its health score is reduced by 4. |
The following examples illustrate how the health score of a node is computed when it is impacted by events:
- If the node is impacted by one critical and one major event, its health score = 100 - 10 - 8 = 82
- If the node is impacted by two warning events, its health score = 100 - 8 = 92
Service designers can customize the values assigned to the severity score based on their organization's requirements. For more information, see Customizing-health-score-and-health-status.
Watch the following video to get an overview of the advanced service health score configuration options:
Watch the YouTube video about the advanced service health score configuration in BMC Helix AIOps.
Node health score computation with event rules
By default, the health score for an impacted service is computed based on the events generated on all the nodes (CIs) that are part of the service. However, as a service designer, you can define event rules to consider only specific events based on the impacted CIs (host), event severity, message, object, or object class. For example, if you have defined an event rule that considers only events with the major severity, all events with the major severity are considered. The event rule you define for a service applies to all the nodes that are part of the service.
The following example illustrates how the health score of a node is computed when an event rule is defined to consider only the events with the Major severity. If a node is impacted by Major and Minor events, only the Major events are considered for the health score computation.
- If a node is impacted by three major events and two critical events, its health score = 100 - (3*8) = 76
- If a node is impacted by two minor events, its health score = 100 - 0 = 100
Node health score computation with health indicators
You can define one or more metrics associated with a service as health indicators that represent the overall health of the service. For example, if you are using synthetic transactions to measure the availability and response time of a web application, those availability and response time metrics are good candidates to be health indicators. When you define health indicators, you associate thresholds with them. When these thresholds are breached, the service health score reflects that the service is no longer completely healthy. For more information, see Adding-health-indicators.
The thresholds associated with service health indicators are also used for service predictions. For more information, see Predicting-and-proactively-resolving-service-outages.
By default, an event which is generated due to a breach in the health indicator threshold (also called health indicator event) is assigned a score as listed in the following table:
Health indicator event severity | Score | Reduction in health score |
|---|---|---|
Critical | 20 | If the node is impacted by one critical event, its health score is reduced by 20. |
Major | 16 | If the node is impacted by one major event, its health score is reduced by 16. |
Minor | 12 | If the node is impacted by one minor event, its health score is reduced by 12. |
Warning | 8 | If the node is impacted by one warning event, its health score is reduced by 8. |
Service designers can customize the values assigned to the severity score based on an organization's requirements. For more information, see Customizing-health-score-and-health-status.
The following examples illustrate how the health score of a node is computed when a health indicator, for example, Disk Space Used is defined for a service. The node is impacted by health indicator events due to breach in the Disk Space Used threshold.
- If the node is impacted by two Major severity health indicator events, its health score = 100 - (2*16) = 68
- If the node is impacted by three Critical severity health indicator events, its health score = 100 - (3*20) = 40
Node health score computation with both health indicators and event rules
If you have defined both health indicators and event rules for a service, events that are generated due to a threshold breach of these metrics and that match the criteria defined in the event rules are considered for computing the health score of its nodes. The following table describes how the health score of a node is computed when either health indicators or event rules, or both are defined for a service:
Metrics defined as health indicators? | Event rules defined? | Events considered for health score computation | Example |
|---|---|---|---|
Yes | No | The following types of events are considered:
By default, the health score reduced due to an event generated for a health indicator is double the value of the score reduced due to an event generated for a CI. For more information, see Customizing health score and health status. | A service is associated with three metrics: Disk Space Used, CPU Utilization, and Memory Utilization, and you have defined Disk Space Used and CPU Utilization as health indicators. If a critical event is generated for CPU Utilization, the node health score is reduced by 20. If a critical event is generated for any CI, the node health score is reduced by 10. |
Yes | Yes | The following types of events are considered:
| If you have defined the CPU Utilization and Memory metrics as the health indicators and defined an event rule so that events with only the critical severity are considered, events with only the critical severity for these metrics are considered. |
No | Yes | Only the events that satisfy the event rules are considered. | If you have defined an event rule that considers only events with the warning severity, then all events with the warning severity are considered. |
No | No | All events are considered. | If you have not defined health indicators or event rules, events with all severities for all the CIs that are part of a service are considered. |
The following examples illustrate how the health score of a node is computed when an event rule is defined to consider only the events with Critical severity and a health indicator is defined for the service.
- If the node is impacted by two Major severity health indicator events and two other Critical events, its health score = 100 - (2*16) - (2*10) = 58
- If the node is impacted by one Major severity health indicator event and two other Minor events, its health score = 100 - (1*16) = 84
Service health score computation
A service contains multiple nodes of the same or different types or a single node, and one or more nodes can be impacted by events.
Computation when a service contains multiple nodes and multiple nodes are impacted
The following examples illustrate how the service health score is computed if multiple nodes are impacted:
Computation when a service contains multiple nodes and only one node is impacted
If a service contains multiple nodes and only one node is impacted, the health score of the service is the node score of the impacted node. The node score depends on the severity of the event. For example, if a critical event has impacted the node, the node score and therefore, the service health score is 90 (100 - 10).
Computation when a service contains only one node and the node is impacted
If a service contains only one node and the node is impacted, the service health score depends on the severity of the event that has impacted the node. For example, if a major event has impacted the node, the node score and, therefore, the service health score is 92 (100 - 8).
Impact propagation and service health score
By default, the impact on the child services is propagated to the parent service and health score of the parent service is determined by the health score of the child services. However, as a service designer, you can stop the impact propagation based on your organization's needs. For more information, see Customizing health score and health status.
The following examples illustrate how the health score of a parent service is computed if the impact on the child services is propagated to the parent service.
Balancing profiles and service health score
As a service designer, you can use a balancing profile to specify a threshold by selecting a certain number or percentage of CIs to make sure that the service remains healthy as long as these CIs are healthy. The health score is computed based on the events generated from the selected CIs in the balancing profile. If no balancing profiles are defined, all events for all CIs are considered while computing the health score. For more information, see Adding balancing profiles.
Influence of multi-service situations on health score computation based on impact severity
When multi-service situations are enabled, BMC Helix AIOps correlates events from multiple services into a single situation if they share common infrastructure. In such cases:
- If the Include external CI events option is enabled, the health score may be influenced by events on external CIs connected through a shared topology.
- An event on an external CI is considered only when at least one CI within the service is impacted.
- If no included CI is impacted, external CI events do not affect impact severity or the resulting health score.
Because the impact severity method considers the CI with the highest impact (lowest score), an event on an external CI may become the determining factor for the service health score if it has the highest severity among all considered events.
For information about how to enable Multi-service situations and the Include external CI events options, see Configuring ML-based situations and Configuring global settings for service health.
