Health score computation by impact severity

When BMC Helix AIOps computes the health score of a service by impact severity, it considers the severity of the impact to determine the health score of a service. The score of the node with the highest impact (lowest score) is considered the service's health score.

Service health score computation process

The following process is used to compute the health score of a service:

Node health score is computed based on the following factors:
- Events impacting the node
- Event rules
- Health indicators
Service health score is the lowest node score among all the nodes.

Because the health score of a service depends on the health score of its nodes, let's first look at how the health score of a node is calculated.

Node health score computation

The health score for a node is computed by using causal events that impact the node.

Node health score computation without any event rules or health indicator events

By default, the health score of a node is 100. Each event severity is assigned a score as listed in the following table:

Event Severity	Score	Reduction in health score
Critical	10	If the node is impacted by one critical event, its health score is reduced by 10.
Major	8	If the node is impacted by one major event, its health score is reduced by 8.
Minor	6	If the node is impacted by one minor event, its health score is reduced by 6.
Warning	4	If the node is impacted by one warning event, its health score is reduced by 4.

The following examples illustrate how the health score of a node is computed when it is impacted by events:

If the node is impacted by one critical and one major event, its health score = 100 - 10 - 8 = 82
If the node is impacted by two warning events, its health score = 100 - 8 = 92

Service designers can customize the values assigned to the severity score based on their organization's requirements. For more information, see Customizing-health-score-and-health-status.

Watch the following video to get an overview of the advanced service health score configuration options:

Watch the YouTube video about the advanced service health score configuration in BMC Helix AIOps.

Node health score computation with event rules

By default, the health score for an impacted service is computed based on the events generated on all the nodes (CIs) that are part of the service. However, as a service designer, you can define event rules to consider only specific events based on the impacted CIs (host), event severity, message, object, or object class. For example, if you have defined an event rule that considers only events with the major severity, all events with the major severity are considered. The event rule you define for a service applies to all the nodes that are part of the service.

The following example illustrates how the health score of a node is computed when an event rule is defined to consider only the events with the Major severity. If a node is impacted by Major and Minor events, only the Major events are considered for the health score computation.

If a node is impacted by three major events and two critical events, its health score = 100 - (3*8) = 76
If a node is impacted by two minor events, its health score = 100 - 0 = 100

Node health score computation with health indicators

You can define one or more metrics associated with a service as health indicators that represent the overall health of the service. For example, if you are using synthetic transactions to measure the availability and response time of a web application, those availability and response time metrics are good candidates to be health indicators. When you define health indicators, you associate thresholds with them. When these thresholds are breached, the service health score reflects that the service is no longer completely healthy. For more information, see Adding-health-indicators.

The thresholds associated with service health indicators are also used for service predictions. For more information, see Predicting-and-proactively-resolving-service-outages.

Important

If no health indicators are defined for a service, all the metrics associated with the service for which alarm thresholds are defined have the potential to impact the health score. Any alarm generated for any CI that is part of the service affects the score. In this scenario, it is not necessary to define health indicators. However, not all metrics are of equal importance. Some metrics, such as those that represent performance and availability, are better indicators of service health than others. If you have metrics like these for a service, consider defining them as health indicators.

By default, an event which is generated due to a breach in the health indicator threshold (also called health indicator event) is assigned a score as listed in the following table:

Health indicator event severity	Score	Reduction in health score
Critical	20	If the node is impacted by one critical event, its health score is reduced by 20.
Major	16	If the node is impacted by one major event, its health score is reduced by 16.
Minor	12	If the node is impacted by one minor event, its health score is reduced by 12.
Warning	8	If the node is impacted by one warning event, its health score is reduced by 8.

Service designers can customize the values assigned to the severity score based on an organization's requirements. For more information, see Customizing-health-score-and-health-status.

The following examples illustrate how the health score of a node is computed when a health indicator, for example, Disk Space Used is defined for a service. The node is impacted by health indicator events due to breach in the Disk Space Used threshold.

If the node is impacted by two Major severity health indicator events, its health score = 100 - (2*16) = 68
If the node is impacted by three Critical severity health indicator events, its health score = 100 - (3*20) = 40

Node health score computation with both health indicators and event rules

If you have defined both health indicators and event rules for a service, events that are generated due to a threshold breach of these metrics and that match the criteria defined in the event rules are considered for computing the health score of its nodes. The following table describes how the health score of a node is computed when either health indicators or event rules, or both are defined for a service:

Metrics defined as health indicators?	Event rules defined?	Events considered for health score computation	Example
Yes	No	The following types of events are considered: Events generated for the metrics that are defined as health indicators All other events generated for any CI that is associated with the service By default, the health score reduced due to an event generated for a health indicator is double the value of the score reduced due to an event generated for a CI. For more information, see Customizing health score and health status.	A service is associated with three metrics: Disk Space Used, CPU Utilization, and Memory Utilization, and you have defined Disk Space Used and CPU Utilization as health indicators. If a critical event is generated for CPU Utilization, the node health score is reduced by 20. If a critical event is generated for any CI, the node health score is reduced by 10.
Yes	Yes	The following types of events are considered: Events that are generated on the health indicators due to associated policies Events generated on metrics other than health indicators that satisfy the defined rules	If you have defined the CPU Utilization and Memory metrics as the health indicators and defined an event rule so that events with only the critical severity are considered, events with only the critical severity for these metrics are considered.
No	Yes	Only the events that satisfy the event rules are considered.	If you have defined an event rule that considers only events with the warning severity, then all events with the warning severity are considered.
No	No	All events are considered.	If you have not defined health indicators or event rules, events with all severities for all the CIs that are part of a service are considered.

The following examples illustrate how the health score of a node is computed when an event rule is defined to consider only the events with Critical severity and a health indicator is defined for the service.

If the node is impacted by two Major severity health indicator events and two other Critical events, its health score = 100 - (2*16) - (2*10) = 58
If the node is impacted by one Major severity health indicator event and two other Minor events, its health score = 100 - (1*16) = 84

Service health score computation

A service contains multiple nodes of the same or different types or a single node, and one or more nodes can be impacted by events.

Computation when a service contains multiple nodes and multiple nodes are impacted

The following examples illustrate how the service health score is computed if multiple nodes are impacted:

Example 1

A service contains ten nodes (virtual machines) and multiple nodes are impacted. The following table shows the health score of the nodes. For information about how the health score of a node is computed, see Node health score computation.

Node 1	Node 2	Node 3	Node 4	Node 5	Node 6	Node 7	Node 8	Node 9	Node 10
100	80	74	80	70	92	100	88	84	100

The score of the node with the highest impact (lowest score) is considered the health score of the service. Here, the health score of Node 5 is the lowest (has the highest impact). Therefore, the service health score is 70.

Example 2

A service contains five database nodes, three host nodes, ten virtual machine nodes, and eight other device type nodes, and multiple nodes are impacted.

The following table shows the health scores of these nodes. For information about how the health score of a node is computed, see Node health score computation.

Node kind	Node score sorted in ascending order
Database	80	84	90	100	100
Host	74	88	90
Virtual machine	58	66	66	66	72	78	88	90	100	100
Other node kinds	40	50	54	58	66	72	88	100

The lowest score among all the nodes is the health score of the service. Therefore, the service health score is 40.

Computation when a service contains multiple nodes and only one node is impacted

If a service contains multiple nodes and only one node is impacted, the health score of the service is the node score of the impacted node. The node score depends on the severity of the event. For example, if a critical event has impacted the node, the node score and therefore, the service health score is 90 (100 - 10).

Computation when a service contains only one node and the node is impacted

If a service contains only one node and the node is impacted, the service health score depends on the severity of the event that has impacted the node. For example, if a major event has impacted the node, the node score and, therefore, the service health score is 92 (100 - 8).

Impact propagation and service health score

By default, the impact on the child services is propagated to the parent service and health score of the parent service is determined by the health score of the child services. However, as a service designer, you can stop the impact propagation based on your organization's needs. For more information, see Customizing health score and health status.

The following examples illustrate how the health score of a parent service is computed if the impact on the child services is propagated to the parent service.

Example 3

The parent service has not been impacted by any event. The health score of the impacted child services is 30, 40, and 50.

The health score of the parent service is the lowest health score from across the child services. Therefore, the health score of the parent service is 30.

Example 4

The parent service has been impacted by four Critical severity events. Due to which, the health score of the parent service is 20. The health scores of the impacted child services are 30, 40, and 50.

The health score of the parent service is the lowest health score amongst its own score and from across the child services. Therefore, the health score of the parent service is 20.

Example 5

The parent service has been impacted by four Critical severity events. Due to which, the health score of the parent service is 60. The health score of the impacted child services is 30, 40, and 50.

The health score of the parent service is the lowest health score amongst its own score and from across the child services. Therefore, the health score of the parent service is 30.

Important

Consider a service model with 3 nodes, where a circular relationship is created, as shown in the example.
cyclic service model.png

When an event is generated for node A, the impact is propagated to the remaining two nodes because of the service model's circular nature.

Here, when Node A receives an event for the first time, the impacting child services count for Node A is 0, Node B has an impacting child services count of 1, and Node C has an impacting child service count of 2.

The count varies for each node for the first time. This happens because when the event occurs at Node A; Node B, and Node C are not directly impacted. Hence, the impacting child services count for Node A is 0.

Similarly, when the impacted child services count for Node B is calculated, by that time, Node C is not directly impacted. So, the impacting child services count for Node B is 1.

When the impacted child services count for Node C is calculated, by that time, Nodes A and B are directly impacted. Hence, the impacting child services count for Node C is 2.

When an event occurs for the second time, the impacting child count for all three nodes will be two because all three nodes were already impacted during the first event cycle.

Balancing profiles and service health score

As a service designer, you can use a balancing profile to specify a threshold by selecting a certain number or percentage of CIs to make sure that the service remains healthy as long as these CIs are healthy. The health score is computed based on the events generated from the selected CIs in the balancing profile. If no balancing profiles are defined, all events for all CIs are considered while computing the health score. For more information, see Adding balancing profiles.

Influence of multi-service situations on health score computation based on impact severity

When multi-service situations are enabled, BMC Helix AIOps correlates events from multiple services into a single situation if they share common infrastructure. In such cases:

If the Include external CI events option is enabled, the health score may be influenced by events on external CIs connected through a shared topology.
An event on an external CI is considered only when at least one CI within the service is impacted.
If no included CI is impacted, external CI events do not affect impact severity or the resulting health score.

Because the impact severity method considers the CI with the highest impact (lowest score), an event on an external CI may become the determining factor for the service health score if it has the highest severity among all considered events.

For information about how to enable Multi-service situations and the Include external CI events options, see Configuring ML-based situations and Configuring global settings for service health.