Detecting incidents
An incident is an episode characterized by abnormal performance, availability, or web traffic volume as compared to a baseline. Examples of incidents include:
- Exceptionally low traffic volume
- Poor performance
- Service level threshold (SLT) violations
- Frequent availability errors
- Periods of web application unavailability
The system prioritizes incidents by type of abnormality, duration, and the number of users affected.
To detect incidents, set up your system to monitor the appropriate parameters and take the appropriate measurements, as described in the following topics:
What does the system monitor for incidents?
The system can monitor Watchpoints, SLTs, and error conditions for incidents:
- Watchpoints — The system identifies performance, traffic volume, and some availability incidents by monitoring specified Watchpoints.
- Service Level Thresholds — To identify performance-related incidents, the system relies on page SLTs, which are rules that determine whether a page was delivered to your end users within an acceptable time. By default, the system detects an incident when some predefined percentage of requests to your web application violate their SLTs. The exact percentage is configurable in the incident settings.
- Error rules — The system can monitor error rules, so that when a certain error occurs, it is reported as an incident.
Measuring abnormality
Abnormality describes how exceptional or atypical an incident is.
The system measures abnormality based on a logarithmic scale. It establishes a baseline for normal trends in your web traffic, and then calculates the number of standard deviations from the baseline that your web traffic exhibits during a 30-minute period.
Abnormality
The number of standard deviations away from the baseline is the incident's abnormality. Each type of incident has its own type of abnormality, as described in the following table:
Incident type | Type of abnormality |
---|---|
Availability | The system detected an abnormally high percentage of requests with errors for a particular Watchpoint. |
Performance | The system detected an abnormally high percentage of pages that violated their SLTs for a particular Watchpoint. |
Volume | The system detected an abnormally high or abnormally low amount of traffic on a particular Watchpoint. |
By default, the system represents abnormality as an index value from 0 to 4.
Abnormality rating based on multiples of the standard deviation
Number of standard deviations from baseline | Abnormality rating |
---|---|
Less than or equal to 1 | 0 |
1 – 2 | 1 |
2 – 4 | 2 |
4 – 16 | 3 |
Greater than or equal to 16 | 4 |
Measuring affected user sessions
For each type of incident, the system calculates the number of user sessions affected (the number of distinct sessions detected for the Watchpoint during the specified period when the incident occurred).
Monitoring incidents
To monitor incidents that occur in your web traffic, you can:
- Watch the Incidents page of a Real User Analyzer component, which provides the list of incidents occurred during the specified period of time
- Receive incident alerts via email or SNMP traps
- Monitor the incident-related dashlets on the Real User Analyzer dashboard
- Monitor Watchpoints for incidents
Related topics
Configuring incident-detection rules
Defining-page-SLTs-to-measure-application-performance-compliance-on-the-Analyzer