Automatic outage records
An outage represents the unavailability of a configuration item (CI) at a given time. When an outage occurs, the event management product can detect and send alerts that can be recorded as outage records and related to an incident in Remedy IT Service Management (Remedy ITSM).
In Remedy ITSM, outage records are also referred to as unavailability records. Two types of outage records exist: planned unavailability and unplanned unavailability. A planned unavailability/outage of a CI is usually governed by the change management process and is often intended to document scheduled maintenance windows. However, frequent unplanned outages lead to unstable environments.
Although both types of outages are key input parameters in determining if Service Level Agreements (SLAs) have been met, unplanned unavailability is the type that puts most SLAs at risk because they can drive up infrastructure management costs. These outage records are also important data points to consider as part of the Problem Management process. For more information, see in the BMC Asset Management online documentation.
Even though creating unavailability records for each CI that went down was an established part of a customer's Asset Management process, these records would not be created due to the amount of effort required to document each CI outage.
The outage functionality is designed to automate providing visibility of unplanned outages in Remedy ITSM when detected by the event management product. These outages will be reflected in Remedy ITSM as unplanned unavailability records.
BMC Service Resolution enables you to configure policies in the event monitoring system to automatically generate an outage record as soon as a CI goes down due to the unplanned down time. For example, when an email server goes down due to low disk space, the event monitoring system detects this event and generates an alert, which results in an unplanned unavailability record for the email server CI in Remedy ITSM.
If an incident is also generated for this event, the outage will be related to the incident. If this outage occurs during a planned outage window, the unplanned outage is still created and related to the incident for record keeping.
The following diagrams illustrate a scenario that depicts an end-to-end information flow for outage record creation and closure:
Information flow for creating outage records
- An email server goes down due to unavailability of a CI.
- Based on the outage policy and incident policy configuration, the Infrastructure Management cell sends the CI unavailability notification to the integration gateway. The cell also sends the causal event for the outage and incident.
For information about creating an outage policy, see Managing outage policies.
The integration gateway makes a web service request to the relevant web services to create an incident and an outage record. The incident and outage records are updated with the relevant details. Using the Causal event ID in the Event form, a relationship is established between the incident and the outage record. For information about using the web service, see Using AST_CI_Unavailability_Interface web service.
Information flow for closing outage records
- The CI becomes available and the email server is back up and running.
- The Infrastructure Management cell sends the CI availability notification with the end time of the outage to the integration gateway. The cell also sends information about the closure of the causal event.
- The integration gateway then makes a web service request to the relevant web service to update the incident and the outage record.
Scenarios and examples for calculation of unplanned outage duration
The following diagram illustrates the scenarios supported to calculate the duration of an unplanned outage. As an example, a planned outage was scheduled for a CI from 2:00 P.M. to 6:00 P.M. The following scenarios explain how the outage duration is calculated when an unplanned outage occurs on the same CI.
- Scenario 1—An unplanned outage occurs on the same CI from 1:00 P.M. and ends at 3:00 P.M. The unplanned outage duration is 1 hour, calculated from 1:00 P.M. to 2:00 P.M.
- Scenario 2—An unplanned outage on the same CI that starts at 1:00 P.M. and ends at 7:00 P.M. The outage duration is 2 hours, calculated from 1:00 P.M. to 2:00 P.M. and 6:00 P.M. to 7:00 P.M.
- Scenario 3—An unplanned outage on the same CI starts at 3:00 P.M. that ends at 5:00 P.M. This is not considered as an outage because this outage occurred during planned outage duration.
- Scenario 4—An unplanned outage on the same CI starts at 3:00 P.M. that ends at 7:00 P.M. The outage duration is 1 hour, calculated from 6:00 P.M. to 7:00 P.M.
Relationship between event, incident, and outage record
An incident record and an outage record are related to one another based on the causal event.
Information about outage records and relationships
Different users can view details about a CI outage by using the event management and incident management applications.
Event Management console operator
An Event Management console operator can view CI unavailability details in the Event Management console.
When the CI goes down
When the CI is back up:
Service Desk agent
A Service Desk agent can view the relationships between an outage and a related incident on the Relationships tab of BMC Service Desk: Incident Management.
Causal CI details on the Relationships tab of Service Desk
Impacted CI details on the Relationships tab of Service Desk
An Asset Manager can view outage record details, such as unavailability ID and status, on the Outage tab of the CI in BMC Asset Management.
Causal CI details on the Outage tab in BMC Asset Management
Impacted CI details on the Outage tab in BMC Asset Management