Confidence levels of What-if simulation
When you simulate the resource growth, each deployment in the simulation results displays a Confidence level of the simulation for that deployment. For more information about a What-if simulation, see Using simulations to predict the growth of your resources.
The Confidence level for a deployment indicates the confidence level of the forecast for the deployment.
The What-if simulation uses the 95th percentile of the hourly samples for the last 1 year of data. By default, BMC Helix Continuous Optimization automatically uses an algorithm that achieves the best results in simulating the resource growth. The prediction for the selected forecasting period is based on the characteristics of the data. The number of samples in the past and the behavior of the time series determines the confidence of the algorithm.
Computation of the confidence level
Each time the forecast algorithm is applied, BMC Helix Continuous Optimization computes the prediction error.
The Confidence is computed with a cross validation procedure, which involves the time series to be split into two parts:
- Training set: Approximately, the first 80% of the total time-series time span, it is used to train the predictor.
- Validation set: It is composed by the last 20% of the total time-series time span. It is used for comparison with the predictions produced by the predictor in the same time interval. This comparison is done through performance metrics (cross-validation).
The final output prediction is then produced after having re-trained the predictor with the entire timespan of the time-series.
Based on the cross-validation results, the prediction error is determined.
The confidence represents the average expected error of the computed forecast on future values (notice that it has the same measurement unit of the data). This confidence level is displayed in the Deployments table of the Simulation results page. The Confidence status can be used by expert users to determine how much they can trust the simulated forecast.
The confidence depends on the following factors:
- Accuracy index: Accuracy Index can be defined as the expected percentage error of the prediction with respect to the range of real data. A few additional checks are performed on the historical data to refine this index.
- Data length: The number of data points used in the forecast, both for training and for cross-validation.
- Regime change: A regime change is a time instant in data where the data that follows have substantially different statistical properties with respect to the data that came before. For example, consider a series expressed as a percentage that increases or decreases the overall capacity (denominator in the percentage calculation). This generates a step in the forecast even if the absolute usage remains constant.
For example, consider a server that has a constant daily utilization percentage, say 60%. Doubling the number of CPUs, the utilization percentage will decrease by half, and so the new regime is 30%.
The value of the Accuracy index generally determines the Confidence level. If the time series is short, or a Regime change occurred in the validation set, the forecast confidence can be low.
Confidence levels in the Deployments table
The Confidence column displays one of the following icons. Hover over the icon to view the message. The following table explains the confidence levels that are associated with a simulation in the Simulation results page. You can use the information in the Explanation column to understand the cause of the levels and the actions that you can take to improve them.
|The prediction is highly reliable.|
A sizable amount of historical data is available, and a clear periodic behavior in the time series is detected.
In such cases, the prediction error is less than 25%, resulting in a reliable prediction.
|The prediction is moderately reliable.||The computation of the prediction error is between 25% and 50%.|
|The prediction is not reliable due to low number of samples.|
The number of data points is too low and cannot be compared to the portion of data that is used to compute the cross-validation forecast. The prediction error is more than 50% and the number of data points is too low.
The number of data points is too low. This can lead to false or poor values.
For example, the most reliable approach is to predict a future period which is less than the length of the past “viewed” period.
|The prediction is not reliable.|
|The prediction is not reliable due to an error during the validation of the prediction.|
This is a technical error.
In this case, the forecast is generated but couldn't determine whether the prediction is good or bad.
|The prediction is not reliable due to a behavior change in the last data segment.||If the data in the validation set presents a regime change (increase or decrease), the prediction might not be reliable.|
Log in or register to comment.