Identifying a CI's critical events and their sources
Even the most complete service model provides little value if there is no consistent flow of events into the model to maintain the real-time status of its CIs. Event associations provide the mechanism for a CI's real-time status to reflect the health of the actual resource that it represents.
To create the event associations for a CI, you must perform the following steps:
- Identify the event classes that will be associated with the CI.
- Establish a naming convention for the logical ID (a key value) so that the same identification string can be derived from each event class to be associated with the CI.
You can perform event analysis to achieve these goals.
Assuming that there is enough event data consistently available to understand the state of IT resources, perform the following actions:
- Analyze the event flow of each real IT resource or group of resources that are instrumented in the same way to identify the following events:
- Events that provide no value to the service model
Not all events received by a cell provide valuable information to the service model. Identify the events that are of no value and must be ignored, either by not sending them to a cell or by filtering them out when they reach the first cell. - Events that provide valuable information about the service environment and must be retained by the cell, such as:
- Events that must be changed or adapted either at the source or in the event adapter that collects them to be usable by the model
- Events that must be enriched by the cell so that they contain the required information; events can be enriched by using Refine and New rules.
- Events that must be processed (by using a Regulate rule) so that only the appropriate occurrences reach the service model
- Events that must be combined through abstraction, correlation, or through New rule updates before entering the service model
This includes events that are best represented by a single higher-order event that represents their net effect or represented by event pairs, such as UP/DOWN.
- Missing events or events that cannot be processed
Some situations that you may want to include in your model are not traced by events in the real environment, or the events produced cannot be associated with the IT resource.
- Events that provide no value to the service model
- For each significant event, determine whether the event must be associated only with a CI or whether it must also participate in the status computation.
For example, a cause event E1 is associated with the CI C1, and a consequence event E2 is associated with the CI C2. Although it might appear reasonable to elect E1 so that its severity value contributes to the status of C1, electing E2 might be of no use if a relationship propagates the impact of event E1 from CI C1 to CI C2. - Consider how the monitoring tool, such as an agent, adapter, or script, reports the state of the service's IT resources.
- Does the monitoring tool send alerts only when something goes wrong?
If so, does it close the alerts automatically?
If the monitoring tool does not close alerts automatically, you may need to automate their closure through rules containing the appropriate cycle and conditions. - Does the monitoring tool send status-type events, such as ok or not ok, at regular intervals?
- Does the monitoring tool handle CI availability with some form of heartbeat?
- When does the IT CI representing the resource transition from AVAILABLE to UNKNOWN or from AVAILABLE to UNAVAILABLE, and back again?
- What is the reliability of the event flow?
Even the most complete service model is of little value if a consistent flow of events into the model cannot be maintained. When creating new event propagation paths, you must take care to consider whether you can improve or, at least, not affect event flow.
- Does the monitoring tool send alerts only when something goes wrong?
Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*