Event Orchestration for a Service Down event
This topic describes the end-to-end event lifecycle management process by using the Event Orchestration run book for a service down event.
Overview
In a datacenter, servers are monitored for certain recurring, high-volume events can overwhelm the IT staff with repetitive troubleshooting and repair tasks. The Event Orchestration run book provides an automated approach for applying triage and performing remediation actions on certain events, which are configured to be supported by the run book.
For a service down event generated in TrueSight Infrastructure Management, the Event Orchestration run book enriches the event, performs triage, creates an incident (if not created already), and based on whether a remediation action is required, creates a change ticket in BMC Remedy ITSM.
For the service down event use case, TrueSight Infrastructure Management is used as the event management tool and BMC Remedy ITSM is used as the ITSM system.
Before you begin
Before the service down event is managed, you must ensure that the following tasks are complete:
Prerequisite tasks | See page |
---|---|
All the required products are installed, and up and running. | |
On a TrueSight Infrastructure Management server,enumeration type for the service down event, and event enrichment rules are defined in the main cell. | |
On a TrueSight Infrastructure Management server, the main cell is configured to propagate events to the TSO gateway. | |
On a TrueSight Orchestration peer, adapters required for the run book are configured and enabled. | |
On a TrueSight Orchestration peer, modules required for the run book are configured and activated. |
Event Orchestration for a Service Down event
The event lifecycle starts when an event is generated by a PATROL Agent, which is monitoring the target server, for any service that is in a stopped state.
- PATROL Agent forwards the event to the TrueSight Infrastructure Management main cell from where, the main cell.
You can also see the event in the TrueSight Presentation Server. You can check the status of the event in the Logs&Notes tab in the TrueSight Presentation Server.
When an event is seen in the TSPS, the status of the event is – Event Type is: ServiceDown. - In the main cell, the event is enriched to contain event-related data such as the service name, event category, event sub-category, and event type.
- From the main cell, the event is sent to the gateway – TrueSight Operations Manager monitor adapter.
- The monitor adapter receives the event and converts the event into a common event model format.
For converting the event to a CEM format, one must ensure that the Use Common Event Model element is specified as true in the TrueSight Operations Manager monitor adapter configuration. - After converting the event into a CEM format, the event is sent for rules evaluation, and a BMC-SA-Event_Orchestration:Process Event workflow is triggered. The Process Event workflow consists of sub-processes, which performs the triage and remediation actions for the service down event.
- The Extract Configuration Data workflow extracts the configuration information based on the event type.
- The BMC-SA-Event_Orchestration_Service_Down:Perform Triage workflow is invoked, which verifies whether the service is stopped on a target server.
The Perform Triage workflow is specific to the event type and is included in the event type module. For example, if you create a new module for a new event type, the Perform Triage workflow is invoked from the new module. - If the service is stopped, the triage is successful, and if remediation is required, the Post-Triage Actions workflow creates an incident in the BMC Remedy ITSM application. If the incident is already created, the incident is updated.
- If incident is created or updated successfully, depending on whether you want to execute remediation actions, the Pre-Remediation Actions workflow is invoked, which creates a change and an associated task is created in BMC Remedy ITSM.
Configuration settings made in the BMC-SA-Event_Orchestration_Configuration module determine the change template to be used to create change and associated task for the event. If you want to skip creating change and associated task, you can specify the BMC-SA-Event_Orchestration_Service_Down module configuration item – ChangeEnabled to false. - If no change ticket is created, remediation starts immediately. If a change ticket is created, the remediation proceess awaits approval of the change.
Based on the approval process configured in your BMC Remedy ITSM environment, the change is approved, and the Perform Remediation workflow is invoked, which starts the service on the target server.
The Perform Remediation workflow is specific to the event type and is included in the event type module. For example, if you create a new module for a new event type, the Perform Remediation workflow is invoked from the new module. - After the remediation is complete, a Post-Remediation Actions workflow is invoked, which invokes the BMC-SA-Event_Orchestration_Service_Down:Perform Validation workflow. After the validation process is completed, the change, task, and incident status is updated and the tickets are marked as closed.
- The Service Down event is successfully managed by applying triage, performing remediation actions, which involves starting the service, while an incident, change and task are created to track the event management lifecycle. At each step, the Logs & Notes tab in the TSPS for the corresponding event is updated with the status of the event orchestration process.
Where to go from here
For adding custom use cases to use the Event Orchestration run book, see Adding-a-new-use-case-to-the-Event-Orchestration-runbook.