Total incident count and mean time to resolve (MTTR) indicators for a reliable incidence-response process
Incidents
An incident is any event that is not part of the standard operation of a service and that causes an interruption or a reduction in the quality of that service.
The Overview page displays the total incidents for a selected time range as shown in the following example. In the example, there are 35 incidents in the last 24 hours:
Mean time to resolve (MTTR)
MTTR represents the average time taken to resolve a set of incidents. This metric includes the time spent during the alert and diagnostic processes before repair activities are initiated. In other words, MTTR describes both the reliability and availability of a system. Reliability refers to the probability that the service will remain operational over its life cycle. Availability refers to the probability that a system will be operational at any point in time. The shorter the MTTR, the higher the reliability and availability of the system.
What is the source of incidents for MTTR computation?
To compute MTTR value, BMC Helix AIOps considers INCIDENT_INFO events from BMC Helix Operations Management.
The Overview page displays the MTTR and its trend for a selected time range as shown in the following example. In the example, the average time taken to close 4 incidents in the last 7 days is 2 days and 3 hours:
MTTR computation
MTTR is computed as
MTTR = The time taken to close the incidents for a selected time range/Total incidents closed for a selected time range
Example
Total incidents closed in the last 7 days = 4
Time range selected is Last 7 days
Time taken to close these 4 incidents:
- Incident 1 was closed in 44 hours
- Incident 2 was closed in 100 hours
- Incident 3 was closed in 40 hours
- Incident 4 was closed in 20 hours
MTTR = (44 + 100 + 40 + 20)/4 = 204 / 4 = 2d 3h
Comments
Log in or register to comment.