General guidance on Time Forecasting Models
The Time Forecasting Model (TFM) is simple to use on any time series metric, and its results are easy to interpret and explain. This ease of use makes it tempting to apply to any situation. However, it is important to understand which situations it is suitable for. (See Working-with-time-forecasting-models).
The TFM is a model of a single time series, not a model of an entire computer system or application. The model is based purely on statistical algorithms applied to the historical values of the series. Some of these algorithms try to fit a curve using regression analysis; some try to discover periodicity or regime change and take those into account in predicting future values of the series.
A TFM, thus, is only a tool to extrapolate any time series based on the historical behavior of that one series. Obviously, this type of model doesn't know anything about how the different time series affect each other in a real system or application, or even what a particular metric means.
While a TFM is a general feature and can be applied to any time series, it is not in practice useful for all metrics, only to those whose past behavior by itself is likely to directly influence its future values. Deciding whether the TFM is useful for a given metric, is more an art than a science, but we can suggest that the following kinds of time series should be modeled using TFM:
- Business driver metrics that measure business demand, for example, number of logins, number of tickets opened, etc.
- System metrics that can stand as a proxy for an application's demand, for example, CPU utilization of a system, disk I/O rate, network transfer rate, etc.
- System metrics that measure cumulative resource usage that is monotonically increasing, for example, disk space usage.
Metrics that are unlikely to be useful to model with a TFM are those whose behavior is part of a complex system of interacting metrics. For example, CPU contention or memory contention metrics in a virtual machine, are affected by multiple algorithms. Different workloads are competing for resources and the hypervisor is reallocating resources. In these kinds of cases, metrics like "CPU ready time" or "swapping" or "ballooning" are unlikely to be predictable using only their own historical values.