Predicting and proactively resolving service outages
Predicting or forecasting a service outage enables organizations to move from being reactive to being preventive. Being predictive means organizations can contain smaller issues before they become larger problems, prevent service outages, take timely actions, and focus on key value drivers. Predictive features help organizations to maximize IT performance and save time and cost required to manage IT environments.
The AI-based service-centric prediction consumes service health indicators and metrics from multicloud and hybrid IT environments. It has the capability to predict service outages and help identify and fix issues before they happen. As a result, it helps reduce an organization's overall mean time to resolve (MTTR).
In addition to health‑indicator‑based predictions, BMC Helix AIOps can generate predictions automatically from KPIs that belong to configuration items (CIs) and are part of a service model.
The following video (2:16) shows a high-level overview of service failure prediction in BMC Helix AIOps:
Watch the YouTube video about Service Failure Prediction Overview in BMC Helix AIOps.
The Predictions page provides quick insights on forecasted service failure events. The prediction algorithm is capable of learning from past events to predict failures in the future. The algorithm uses the historical information to calculate a future service failure event based on the defined metric and identifies the threshold point at which the first impact will occur along with the predicted severity.
As time progresses, the algorithm continuously keeps learning and recalculates the status. Based on the recalculations, the availability status can change. This overriding of the previous status occurs at a preconfigured interval set within the algorithm.
Scenarios
As an operator, site reliability engineer (SRE), or tenant administrator, you may need early visibility into potential service issues. The following scenarios show how prediction capabilities in BMC Helix AIOps help identify future service risks and take proactive action to prevent service degradation.
- APEX Global Finance and its challenges
- APEX Global NMS and its challenges
- (Controlled availability customers only) APEX Global Retail and its challenges
APEX Global Finance and its challenges
Jordan is a tenant administrator with APEX Global Finance, a financial services provider. The organization provides API services to multiple business consumers. APEX Global has an SLA of 99.9%, and if any of its API services goes down, it costs them a lot. Jordan thinks that if he can get predictive information about an API outage at least 30 minutes before it happens, he might be able to prevent, which in turn, will help his organization to meet the SLA.
Jordan is using the BMC Helix AIOps service and the outage monitoring features. Now Jordan can use the service prediction feature to list the services that might degrade in the near future. This information helps him maintain the system health more effectively.
APEX Global NMS and its challenges
Scott is a tenant administrator with APEX Global NMS, a network management service provider. The organization aims to provide the best wi-fi experience at all its client locations. Any degradation or drop in Wi-Fi services decreases customer satisfaction and affects clients' businesses.
Scott is using BMC Helix Operations Management for his IT operations management tasks. Additionally, he can use the service-centric predictions feature in BMC Helix AIOps, which gives him the ability to predict and prevent sensitive wi-fi service issues in advance and deliver maximum customer satisfaction.
(Controlled availability customers only) APEX Global Retail and its challenges
APEX Global Retail runs a customer‑facing e‑commerce platform with strict performance SLAs, especially during peak shopping periods. Any service slowdown directly affects customer experience and revenue.
Susan is a tenant administrator at APEX Global responsible for enabling predictive capabilities across services. After enabling KPI‑based predictions at the tenant level in BMC Helix AIOps, predictions are automatically generated for the KPIs associated with services.
Susan notices a prediction indicating that the Response time KPI for a critical Order processing service is forecasted to breach its threshold during an upcoming peak period. This early insight allows operations teams to plan capacity improvements in advance and prevent service degradation, helping APEX Global meet its service‑level objectives without reactive troubleshooting.
The following table lists the actions you must perform, depending on your role, to enable and start monitoring service predictions in BMC Helix AIOps:
Action | Reference |
|---|---|
Enable the service-centric predictions feature As a tenant administrator, from the Manage Product Features page, select the AIOps Service Predictions option. | |
(Controlled availability customers only) Enable the KPI-based predictions feature As a tenant administrator, from Configurations, select the Manage Service Predictions page. | Enabling KPI‑based predictions |
Monitor service predictions As an operator or SRE, monitor service predictions on:
|