Service health score, impact score, and metrics
The service health score and service impact score are the two most important indicators of service health. The health and impact scores provide a quick insight into service health and enable you to take timely action.
Service health score
The service health score is used to assess the health of a service. The service health score is computed for the selected time range using the impacted events associated with each of the service entities and the significance derived from service topology. The higher the health score, the healthier the service. The service health score ranges from 0 to 100.
The service health score is displayed on the service details page as shown in the following image:
The service health is represented using the color-coded severity values as shown in the following image:
If the service severity is Ok, the service is healthy. Any other severity value such as Critical, Major, Minor indicates that the service is impacted.
Service health timeline
The service health score for the selected time range is represented by using the health timeline on the service details page. Here is an annotated screenshot of the service health timeline:
Time range selector. Click the arrow to change the time range. You can select a relative time range such as 3,6,12, and 24 hours. By default, Last3 hours time range is selected. Depending on the time range selected, the timeline is divided into equal-length time slots as shown in the following table:
Time range
Length of each time slot 3 hours 5 minutes 6 hours 5 minutes 12 hours 15 minutes 24 hours 20 minutes Service health score for a specific time slot on the health timeline. Hover over a time slot to view the health score.
- Legends to indicate incidents, events, and change requests on the health timeline. Hover over a legend on the health timeline to view event, incident, or change request details. For more information, see:
- Event noise reduction indicator for prioritized triage and remediation
The health timeline does not display the INFO and OK events. - Total incident count and mean time to resolve (MTTR) indicators for a reliable incidence-response process
- Event noise reduction indicator for prioritized triage and remediation
Service impact score
The service impact score indicates how the service is impacted because of its entities. The service impact score is inversely proportional to its health score. The higher the service impact score, the lower is its health.
Impact score = 100 - service health score
In BMC Helix AIOps, the service impact score is displayed on the service details page as shown in the following image:
The service health score is 90 in the example and hence the impact score is 10.
Metrics
Metric is an important performance indicator in your environment. For example, if you have a Linux monitoring solution and CPU monitor type, the following section lists a few example metrics that can be monitored:
- Utilization
- Load
- Idle time
- Context Switches
You can view the metrics associated with the top three events associated with the top 3 impacted nodes for a service. An example metrics chart is shown in the following image:
Where to go from here
Performing ML-based root cause isolation of an impacted service
Comments
Log in or register to comment.