How metrics data is summarized


Summaries and Statistics

When time series data is loaded by a connector from a data source, it may have arbitrary time stamps and durations. The Capacity Optimization Data Warehouse consists of database tables and automated processes that transform this data into a form usable for analysis, modeling, and reporting. The data is normalized into a common set of meaningful intervals (hour, day, month), and some statistics are computed for each interval. The summarized data and statistics are used for analysis and modeling in the TrueSight Capacity Optimization Workspace, and they are exposed via a set of public database views in order to create data marts for reporting and viewing.

When data is originally loaded by a connector, it is stored in tables called "staging tables". Data warehousing threads wake up to process this data and write it into DETAIL tables, from where it is further summarized (reduced) into higher level tables like hourly, daily, and so on. We describe below how the TrueSight Capacity Optimization Data Warehouse (DWH) summarizes various types of metrics.

There are two types of summarization performed daily by the TrueSight Capacity Optimization data warehouse:

  • The standard data warehousing process, which pre-computes hourly, daily, and monthly summarizations for all time series metrics.
  • The ROLL table, which pre-computes certain statistics like percentiles for a rolling period for some selected time series metrics.

In addition, the administrator can specify additional custom statistics to be computed for specific time series metrics. See Custom Statistics.

The above summarization is performed only for time series metrics (sometimes referred to as "performance" metrics), not for configuration metrics, which are stored as strings whenever they change.

The standard data warehouse tables store pre-computed summaries at these levels: DETAIL, HOUR, DAY, and MONTH. Each type of table has an aging configuration that determines how long data is kept in the data warehouse. See Aging Configuration.

Average, Sum, Min, Max in the Standard Data Warehouse Tables

The DETAIL tables store metric data with the actual time stamps and durations used to load the data originally. These samples can be of varying duration, depending on how the connector found the values in its data source. For any given time series, each sample's duration ends at or before the time stamp of the next sample, i.e., holes are allowed, but no overlaps are allowed. Any holes in the DETAIL samples are carried over as one or more HOUR samples of duration less than 3,600 seconds. To access data in these tables, see the public view PV_SYS_DATA_DETAIL in Accessing-data-using-public-views.

The HOUR tables compute averages, maximums, and other summaries based on the DETAIL samples. Each sample in the hourly table corresponds to a period of one hour, on the hour. This means that each HOUR sample's time stamp is exactly at 00:00, 01:00, etc. To access data in the HOUR tables, see the public view PV_SYS_DATA_DH in Accessing-data-using-public-views.

The DAY tables compute the average of average DETAIL values, maximum of maximum DETAIL values, etc. Each sample in the DAY table represents a 24-hour calendar day in the time zone of the TrueSight  Capacity Optimization Application Server. The start time stamp of each day is thus fixed, and the duration is at most 86,400 seconds. Any holes from the DETAIL data are carried over in the form of shortened duration values. To access data in these tables, see the public view PV_SYS_DATA_D in Accessing-data-using-public-views.

The MONTH tables compute average of average DETAIL values, maximum of maximum DETAIL values, etc. Each MONTH sample represents a calendar month, but separate summaries are created for each hour of the day, for each day class defined in the calendar. To access data in these tables, see the public view PV_SYS_DATA_MDCH in Accessing-data-using-public-views. To understand day classes, see Classifying-dates-and-times-for-data-analysis.

The above statistics are available for constructing analyses in the TrueSight Capacity Optimization Workspace. The detail, hour, day levels are directly selectable in the Analysis Editor user interface. When you select "month" in the Analysis Editor user interface, you see only a single value per month, because the 24 hourly summaries are combined into one value. The same statistics are also accessible in the public views, for creating data marts for reporting and for constructing views.

Splitting DETAIL Samples for HOUR Summarization

If a DETAIL sample crosses an hour boundary, then it is split into multiple HOUR samples before summarizing. The splitting operation splits both the duration and the value. Thus, a single DETAIL sample at start time 01:50:00 and duration twenty minutes will be split into two equal-duration samples of ten minutes each, and they will contribute to the statistics of two consecutive hours, respectively. The value of the samples will be the same as the original if the metric is of a type unaffected by duration; otherwise the values of the split samples will be computed based on their durations. Metric types are described in Metrics.

The DETAIL table will still contain only the original, un-split sample, but the hourly, daily, and monthly tables will count the contributions from the resulting split samples.

The splitting operation can happen only when summarizing from DETAIL to HOUR. The time units beyond HOUR are all multiples of the hourly intervals.

Count and Sum of Squares in the Standard Data Warehouse Tables

At each Data Warehouse summarization period level (hour, day, month), the sample counts and the intermediate quantity sum of squares are computed and maintained. These two statistics are referred to as COUNTVALUE and SUMQUADVALUE in the public views. Both of these statistics are useful primarily for computing the population variance and standard deviation using the "mean of square minus square of mean" formula. These counts and sum of squares numbers are always referring to the detail-level samples, and they can be used to compute standard deviation or variance for any hour, day, or month.

The splitting operation affects both of these values at the hour, day, and month levels. Every split operation causes an increase in the COUNTVALUE statistic and a recomputation of the SUMQUADVALUE from the original DETAIL tables. Thus, any computed standard deviation or variance at the hour, day, or month levels is valid using the "effective" detail samples at that level, not exactly for the original detail samples.

Other counts are also maintained at each level for use in the Optimizer rules, which automatically compare the detail-level samples against thresholds or applies other rules, possibly raising alerts based on them. See Configuring-alerts-and-recommendations for a description of this processing.

Percentile and Other Statistics in Roll Tables

The "ROLL table" stores pre-computed statistics (percentiles and others) for some selected metrics for certain systems. The HOUR data is summarized in a metric-dependent way (sum, average, etc.), based on the type of metric. Only one row of statistics are kept for each metric-system combination. These statistics are not visible in the TrueSight Capacity Optimization Workspace. To access this data for constructing data marts, views, and reports, see public views containing rolling statistics. Metric types are described in Metrics.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*