A baseline is the expected normal operating range for a metric or attribute of a monitor.
The baseline is calculated by collecting the values for a monitor's attributes and performance metrics over a specified time period. Performance metrics are numeric datas and can be looked at in a statistical context. Non-numeric data is used as additional diagnostics to aid in problem isolation. Infrastructure Management collects non-numeric data on demand in the form of diagnostics which are an important part of problem isolation.
Numeric performance data (metrics)
Metrics can be classified as follows:
- Statistical Distribution Types
All performance metrics in the system are described using metadata. The distribution type is important to know how the data is handled downstream when we condense the data and generate baselines of the condensed data.
- Normal Distribution
A large sample of data values follow the typical statistical bell shape curve. Majority of metrics follow this distribution:
- Non-Normal Distribution
Metrics which do not follow the normal distribution are classified as non-normal. Typically these metrics may have values which the normal range. Response time usually follows a non-normal distribution.
It is the numeric data which is collected by the monitors. This is fed into the Analytical Engine as data streams and is constantly under scrutiny for deviations against the normal operating range.
It is the hourly summary of the raw data. Three values are retained for every hour. 90%tile,median, and 10%tiles are taken as high, mid, and low values for non-normal data in order to ensure an accurate baseline is established off of this condensed data. For normal distribution data we will use the max/avg/min for the three values.
A low baseline value (consisting of the 10th percentile of all the values for a given time period) and a high baseline value (consisting of the 90th percentile of all the values for a given time period), are established taking a weighted average of these values over time. A higher weight is given to the latest data being factored into the baseline average. The accuracy of the baseline improves over time.
To track the anomalies in behavior for different attribute types, different patterns are required. For example, for attributes that change frequently, a pattern captured at hourly intervals may be best. Hourly interval ranges represent a smaller number of data points and have a tighter range, which is best suited for capturing frequent changes.
Infrastructure Management captures the following baseline patterns:
- Hourly baseline—Each hour of the day has a high or low value that is tracked. This tracks the pattern for that metric on an hourly basis, and this pattern is repeated for each day. An hourly baseline is initialized after the monitor instance is created and 24 hours of data collection has occurred.
- Daily baseline—A high or low value is derived from the moving average of each consecutive day. This high or low range is taken from a larger number of data values and, consequently, will be a wider range than the hourly. A daily baseline is initialized after the monitor instance is created and 24 hours of data collection has occurred.
- Weekday Pattern—Baseline is calculated daily from Monday to Friday. All these days share the same 24-hour baseline. A weekly baseline is initialized after the monitor instance is created and 168 hours of data collection has occurred.
- Weekend Pattern—Baseline is calculated separately for the weekend - Saturday and Sunday. These two days share the same 24-hour baseline.
- Seasonal baselne—Baseline is calculated separately for pre-determined days when your business experiences out of the ordinary workloads or other special behavior. These days, if calculated into the baseline, it artificially raises or lowers the baseline, causing unnecessary abnormalities.