Time forecasting models - Evaluation metrics

This topic presents an overview of evaluation metrics used in BMC Capacity Optimization for time series forecasting, in particular, the Robust Root Mean Squared Error (RRMSE) and the Reliability Evaluator (RE).

These measures are useful to determine the reliability of a prediction, and enable you to decide whether or not to trust the forecast results.

For more information, refer to the following sections:

Robust Root Mean Squared Error

The Robust Root Mean Squared Error (RRMSE) index is computed with a cross-validation procedure, which involves the time series to be split into two parts:

  • Training set: Approximately 80% of the total length, it is used to calculate the model.
  • Validation set: It is compared with the forecast of the training set.

As a comparison metric, a stable version of the Root Mean Squared Error (RMSE) is computed. Basic RMSE is defined as:


where e(t) is the difference between real data and the prediction, and E( ) is the expected value (mean) operator. To make this performance measure robust, the squared error vector is filtered, removing all values over its 95th quantile.

The obtained metric, Robust RMSE (RRMSE), is therefore robust to both outliers and number of samples.

RRMSE represents the average expected error of the computed forecast on future values (notice that it has the same measure unit of the data) and is displayed in the Execution Summary table of a time forecasting model scenario. It can be used by expert users to determine how much they can trust the returned forecast.

Reliability Evaluator

The Reliability Evaluator (RE) index is used to determine (automatically) whether or not a forecast is reliable. It mainly depends on three different indicators:

  • Accuracy Index: This indicator uses the computed RRMSE (which is an absolute measure, and therefore, not pre-assessable) to obtain a relative measure of accuracy. To do this, RRMSE is divided by the robust version of the range of the validation set (after the removal of outliers). Accuracy Index can be defined as the expected percentage error of the prediction with respect to the range of real data. A few additional checks are performed on the historical data to refine this index. For more information, see Additional Checks for Accuracy Index.
  • Regime change detection: This indicator is a flag which is activated if a regime change is detected in the validation set (or very close to it) occurrence that may bring uncertainty to the accuracy evaluation.
  • Validation inaccuracy: This indicator is a flag which is activated if the validation set is not sufficiently long. For example, for hourly data, 10 days of samples are needed to perform an acceptable reliability analysis, and for daily data – two-and-a-half months are required.

From Scenario options in the console, you can select the Reliability Evaluation mode, which can either be Tight or Loose. With these modes, you can tune discriminating thresholds for Accuracy index values on the following basis:

  • Tight value: Sets 25% as the Warning Threshold, and 50% as the Poor Threshold
  • Loose value: Sets 35% as the Warning Threshold, and 70% as the Poor Threshold

The RE can assume three values – Good, Warning or Poor – which depend on the indicators described above, and the following rules:

  • If the Accuracy index is:
    • Less than the Warning Threshold, the RE is Good
    • Between Warning and Poor Threshold, the RE is Warning
    • Over and above the Poor Threshold, RE is Poor
  • If at least one of the flags is activated, the RE can be Warning or Poor, according to the accuracy index.

Additional Checks for Accuracy Index

The following set of built-in checks in the forecast process trigger the Accuracy Index and Reliability Indicator:

  • Small data: If the time series is entirely contained between values 0 and 1, the Accuracy Index value is reduced. This is done because the Accuracy Index value is biased by small values, which can lead to false or poor results.
  • Reduced variation: If Standard Deviation of the training set is many times (at least 5) higher than the one of validation sets, the Accuracy Index value is reduced. If the data presents this difference, the Accuracy Index would be calculated using a range which is not comparable to the one of the portion of data that is used to compute the cross-validation forecast.
  • Slope and Mean difference: If the slope and the expected value of the validation set are different from those estimated by the cross-validation procedure, the reliability of the forecast is reduced. For more information, see Slope and Mean difference check.

Slope and Mean difference check

The aim of the Slope and Mean difference check (SMdc), based on the the reliability evaluation procedure, is to understand if the slope and the level (mean) of the forecast, obtained from cross-validation procedure, is comparable to real data in validation set.

Given that Xf and Xd are values of the statistics (Mean or Slope) for cross-validation forecast and cross-validation data respectively, the check for the difference is performed by the following formula:

Statistic Difference 

  • If D < Kwarn for both Slope and Mean, this means the RE is not updated
  • If Kwarn < D < Kpoor for both Slope and Mean, this means the Reliability Indicator is degraded (at least) to a Warning status
  • If D > Kpoor for both Slope and Mean, this means the Reliability Indicator is degraded to Poor status


  • Kwarn = 0.5 + Kn + Kvar, and
  • Kpoor = 1.5 + Kn + Kvar

This means that, for example, if slope (or mean) difference is greater than 150% (+/- a corrective factor), then the Reliability Indicator will be set as Poor.

  • Kn is a corrective factor which is the same for mean and slope. It ranges from 0.125 and -0.125, decreasing linearly with the number of samples in the validation set.
  • Kvar has the following definitions for slope and mean check:
    • For slope, check is defined as a decreasing function of the mean of the coefficient of determinations of the validation set and forecast, ranging between +0.125 and –0.125.
    • For mean, check is defined as an increasing function of the mean of the coefficient of variations of the validation set and forecast, ranging between –0.125 and +0.125
Was this page helpful? Yes No Submitting... Thank you