Statistical considerations
Several of the reports produced by
BMC AMI Ops Monitor for CMF
give standard deviations for various measures, such as device busy time, TSO response time, or CPU utilization. This section discusses how the standard deviation affects the statistical accuracy of data in reports.
It is important to understand the impact of the standard deviation because it is a factor to consider when using CMF report data to tune your system.
Standard deviation the mean and the mode
The average of a measurement in a CMF report is the mean value for that measurement. The standard deviation of a measurement in a CMF report is a value signifying the degree of variation that can occur from the mean for that measurement.
A small standard deviation, or small degree of variation, indicates that most of the extracted measurement values are close to the average or mean value. A large standard deviation, or large degree of variation, indicates that the measurement values are widespread in relationship to the mean.
The following figure shows the relationship of standard deviations to the mean.
Relationship of large and small standard deviations to the mean
The standard deviation is particularly valuable when analyzing average TSO response time, where a high standard deviation can indicate irregular service to the end user.
A mode is generally used in reference to distribution graphs. Modes represent peaks in graphed values. A graph can have any number of modes. All that is required to graph a mode is for the preceding and following values to be less than the mode value.
Calculation of standard deviation
The equation used to calculate the standard deviation is shown in the figure below.
where
- N is the number of samples
- Xi is the value of the variable for the ith sample
- i is the sample index
A record interval occurs when the CMF Extractor terminates data collection to write a record and start a new interval. This action is controlled by the INTERVAL parameter of the Extractor REPORT control statement. If there is only one sample, the standard deviation is zero.
Statistical accuracy
Due to the sampling technique used, accurate results are obtained when the number of samples is significant, such as 10,000 samples. Therefore, not only should the standard deviation of a measurement be considered when analyzing report data, but the number of samples counts should also be considered.
The sample counts produced are shown at the top of the report.
The measures reported by CMF are a percentage (P) of the total number of samples taken (N) for which the measured conditions were true.
Statistical measures (with errors that are normally distributed) are usually expressed as a percentage (P) plus or minus a confidence interval (E) with a confidence level of (C).
- The confidence interval is an estimate of the maximum error from the true value of P.
- The confidence level is the probability that the difference between P and the true value is less than (E).
To calculate the statistical error, see the following figure and locate the following data:
- Number of samples taken by the Extractor (the N-axis)
- Desired confidence level (one of the plotted diagonal lines)
The intersection point yields the uncorrected value for the confidence interval (E).
Confidence levels for P=50%
The true confidence interval is the product of the correction factor multiplied by the value of (E) determined above.
For example, if a measure reported by the Analyzer is 10%, the desired confidence level is 95%; if the Extractor took 5000 samples, the uncorrected confidence interval is plus or minus 1.5%. Since the correction factor for a 10% measure is 0.64, then the corrected confidence interval is 0.64 x 1.5% = 0.96%.
In other words, the analyst can expect only 1 chance in 20 (95% confidence level) that the actual value (reported as 10%) was less than 9.04% or greater than 10.96%.
Correction factors for confidence intervals