Monitoring service predictions

As an operator, you can monitor such events and take timely actions to prevent service outages, ahead of time.

The Service Predictions page lists the services that might degrade in the next 12 hours. The predictive analytics algorithm considers all the metrics of the service and their potential threshold limits. The predictions are calculated periodically by analyzing the historical data of the contributing health indicator metrics to predict a potential breach in the threshold limit in the next few hours. For example, CPU Utilization or Disk I/O (disk read and write data) are some of the metrics, which when breaches the threshold capacities slows down the response of applications and impacts the services.

Use the information in this section to view, analyze, and understand the service failure prediction events.

Before you begin

Define various health indicator metrics with threshold limits to monitor the service performance. For more information on adding or editing health indicators to a service, see Modeling-services.

To monitor service prediction events

Click Predictions to see the service prediction events.
For each prediction event, view the list of potentially impacted services and their corresponding health indicators, forecast of impact, and severity of prediction.
By default, the prediction is shown for the next 12 hours. Use the following information to view and analyze the predictions:
- Service: Name of service to be impacted.
- Metric: Health indicator for which the threshold limit is predicted to breach.
- First Impact: The first time the service impact occurs.
- Predicted Severity: Forecasted severity when the first impact occurs.
Click the expander button adjacent to the service name to view the prediction graph.
The prediction graph
The prediction graph contains two sections divided by a vertical line. The left of the vertical line is the historical data graph, and the right of the vertical line is the prediction analysis graph based on that historical data. The dotted orange line above the prediction graph indicates the threshold line. The circular red dot on the right of the vertical line is the point where the threshold breach is predicted to occur. At any point in time the algorithm considers the past 8 hours of data to forecast for the next 24 hours. For services with multiple metrics, each metric can have a predictive event.
(Optional) For performing automation actions or working with the other optional UI elements, use the following information:
- Run an automation action or create/request an automated correction for the anticipated service failure: From the Automations column, run an existing automation, if it exists or from the Actions column, click the action menu and select an option to create or request for an automation. For more information, see Remediating-events-for-services-and-situations.
- Change the prediction period to shorten or extend it: By default, the list shows the services to be impacted at and after next 12 hours. From the drop-down list, select an option (3 Hours, 6 Hours, 9 Hours, 12 Hours, or 24 Hours) to shorten or extend the forecasting period.
- Filter the list of potentially impacted services: Click the Advanced Filters and specify a suitable criteria to filter the list by service name, prediction severity, or both.
- Hide or view columns as per your choice: Click the column selector menu and select or clear the column names you want to view or hide.
View prediction event in BMC Helix Operations Management
The prediction event is classified as an Info type event in
BMC Helix Operations Management
. In the Event Details page > Others tab, you can check the Predicted Severity for the event. For more information on event details, see Viewing event details.

Monitoring service predictions

Before you begin

To monitor service prediction events

On this page