Total incident count and mean time to resolve (MTTR) indicators for a reliable incidence-response process
The Overview page in the BMC Helix AIOps console displays the total incidents for a selected time range as shown in the following example. In the example, there are 292 incidents in the last 24 hours:
Mean time to resolve (MTTR)
MTTR represents the average time taken to resolve a set of incidents. This metric includes the time spent during the alert and diagnostic processes before repair activities are initiated. In other words, MTTR describes both the reliability and availability of a system. Reliability refers to the probability that the service will remain operational over its life cycle. Availability refers to the probability that a system will be operational at any point in time. The shorter the MTTR, the higher the reliability and availability of the system.
The Overview page displays the MTTR and its trend for a selected time range as shown in the following example. In the example, the average time taken to close 4 incidents in the last 24 hours is 4 hours and 33 minutes:
MTTR computation
The MTTR value is computed as
MTTR = The time taken to close the incidents for a selected time range/Total incidents closed for a selected time range
Example
Total incidents closed in the last 24 hours = 4
Time range selected is Last 24 hours
Time taken to close these 4 incidents:
- Incident 1 was closed in 5h 45minutes (345 minutes)
- Incident 2 was closed in 3h 50minutes (230 minutes)
- Incident 3 was closed in 6h 15minutes (375 minutes)
- Incident 4 was closed in 2h 22minutes (142 minutes)
MTTR = (345 + 230 + 375 + 142)/4 = 273 minutes = 4h 33minutes