Monitoring service predictions


As an operator or SRE, you can monitor prediction of service outage and take timely actions to prevent service outages, ahead of time.

The Service Predictions page lists the services that might degrade in the next 12 hours. The predictive analytics algorithm considers all the metrics of the service and their potential threshold limits. The predictions are calculated periodically by analyzing the historical data of the contributing health indicator metrics to predict a potential breach in the threshold limit in the next few hours. For example, CPU Utilization or Disk I/O are some of the metrics that, if they breach the threshold capacities, will slow down application response and impact services.  

Use the information in this section to view, analyze, and understand the service failure prediction events.

Before you begin

Define various health indicator metrics with threshold limits to monitor the service performance. For more information about adding or editing health indicators to a service, see Building-service-models.

To monitor service prediction events

  1. Click Predictions to see the service prediction events. 
  2. For each prediction event, view the list of potentially impacted services and their corresponding health indicators, forecast of impact, and severity of prediction.
    By default, the prediction is shown for the next 12 hours.
    • Service: Name of service to be impacted.
    • Metric: Health indicator for which the threshold limit is predicted to breach.
    • First Impact: First time the service impact occurs. 
    • Predicted Severity: Forecasted severity when the first impact occurs.
  3. Click the expander button adjacent to the service name to view the prediction graph.

    The prediction graph

    The prediction graph contains two sections divided by a vertical line. The left of the vertical line is the historical data graph, and the right of the vertical line is the prediction analysis graph based on that historical data. The dotted orange line indicates the threshold line. The circular red dot on the right of the vertical line is the point where the threshold breach is predicted to occur. At any point in time the algorithm considers the past 8 hours of data to forecast for the next 24 hours. For services with multiple metrics, each metric can have a predictive event. 

    AIOps_PredictionsScreen.png

  4. (Optional) For performing automation actions or working with the other optional UI elements, see the following information:

    • Run an automation action or create/request an automated correction for the anticipated service failure: From the Automations column, run an existing automation, if it exists or from the Actions column, click the action menu action_menu.pngand select an option to create or request for an automation. For more information, see Remediating-events-for-services-and-situations.
    • Change the prediction period to shorten or extend it: By default, the list shows the services to be impacted at and after next 12 hours. From the list, select an option (3 Hours, 6 Hours, 9 Hours, 12 Hours, or 24 Hours) to shorten or extend the forecasting period.
    • Filter the list of potentially impacted services: Click the Advanced Filters and specify a suitable criteria to filter the list by service name, prediction severity, or both.
    • Hide or view columns according to your choice: Click the column selector menu columns.pngand select or clear the column names that you want to view or hide.

The prediction event is classified as an Info type event in BMC Helix Operations Management. From the Others tab, on the Event Details page, you can check the Predicted Severity for the event. For more information about event details, see Viewing event details.

Viewing historical service predictions

BMC Helix AIOps provides the past predictions analysis view of all the services. As an operator or SRE, use the Prediction Analysis dashboard in BMC Helix Dashboards to do the following:

BMC Helix Dashboards permission

Requires a reporting viewer permission in BMC Helix Dashboards.

  • View the prediction analysis. By default, the analysis is displayed for the last 15 days. The analysis can be viewed up to 90 days
  • Troubleshoot an issue
  • Analyze a prediction event in the recent past that is connected to a current issue
  • Get insights into the past predictions that helps in identifying issues that impacts a service the most 
  • Set up a high alert for the most impactful events in advance
  • Optimize and minimize the service degradation through proactive remediation

To view historical service predictions

  1. Click Predictions to see the service prediction events. 
  2. Click Past Predictions
    AIOps_PastPredictions.png

    The Prediction Analysis dashboard is launched on a new tab. By default, the prediction analysis is shown for the last 15 days.

    Prediction_Analysis.png
  3. Change the number of days to view and analyze the past prediction data of your choice.
    For more information, see Prediction Analysis dashboard.