Event Orchestration for a Service Down event


This topic describes the end-to-end event lifecycle management process by using the Event Orchestration run book for a service down event. 

Overview

In a datacenter, servers are monitored for certain recurring, high-volume events can overwhelm the IT staff with repetitive troubleshooting and repair tasks. The Event Orchestration run book provides an automated approach for applying triage and performing remediation actions on certain events, which are configured to be supported by the run book. 

For a service down event generated in TrueSight Infrastructure Management, the Event Orchestration run book enriches the event, performs triage, creates an incident (if not created already), and based on whether a remediation action is required, creates a change ticket in BMC Remedy ITSM. 

For the service down event use case, TrueSight Infrastructure Management is used as the event management tool and BMC Remedy ITSM is used as the ITSM system. 

Before you begin

Before the service down event is managed, you must ensure that the following tasks are complete:

Prerequisite tasks

See page

All the required products are installed, and up and running. 

On a TrueSight Infrastructure Management server,enumeration type for the service down event, and event enrichment rules are defined in the main cell.

On a TrueSight Infrastructure Management server, the main cell is configured to propagate events to the TSO gateway.

On a TrueSight Orchestration peer, adapters required for the run book are configured and enabled.

On a TrueSight Orchestration peer, modules required for the run book are configured and activated.

Event Orchestration for a Service Down event

The event lifecycle starts when an event is generated by a PATROL Agent, which is monitoring the target server, for any service that is in a stopped state. 

  1. PATROL Agent forwards the event to the TrueSight Infrastructure Management main cell from where, the main cell.
    You can also see the event in the TrueSight Presentation Server. You can check the status of the event in the Logs&Notes tab in the TrueSight Presentation Server.
    When an event is seen in the TSPS, the status of the event is – Event Type is: ServiceDown.
  2. In the main cell, the event is enriched to contain event-related data such as the service name, event category, event sub-category, and event type.
  3. From the main cell, the event is sent to the gateway – TrueSight Operations Manager monitor adapter. 
  4. The monitor adapter receives the event and converts the event into a common event model format. 
    For converting the event to a CEM format, one must ensure that the Use Common Event Model element is specified as true in the TrueSight Operations Manager monitor adapter configuration. 
  5. After converting the event into a CEM format, the event is sent for rules evaluation, and a BMC-SA-Event_Orchestration:Process Event workflow is triggered. The Process Event workflow consists of sub-processes, which performs the triage and remediation actions for the service down event.
    ProcessEventwf.png
    1. The Extract Configuration Data workflow extracts the configuration information based on the event type. 
    2. The BMC-SA-Event_Orchestration_Service_Down:Perform Triage workflow is invoked, which verifies whether the service is stopped on a target server. 
      The Perform Triage workflow is specific to the event type and is included in the event type module. For example, if you create a new module for a new event type, the Perform Triage workflow is invoked from the new module.  
    3. If the service is stopped, the triage is successful, and if remediation is required, the Post-Triage Actions workflow creates an incident in the BMC Remedy ITSM application. If the incident is already created, the incident is updated. 
    4. If incident is created or updated successfully, depending on whether you want to execute remediation actions, the Pre-Remediation  Actions workflow is invoked, which creates a change and an associated task is created in BMC Remedy ITSM. 
      Configuration settings made in the BMC-SA-Event_Orchestration_Configuration module determine the change template to be used to create change and associated task for the event. If you want to skip creating change and associated task, you can specify the  BMC-SA-Event_Orchestration_Service_Down module configuration item – ChangeEnabled to false. 

      EnableChange.png
    5. If no change ticket is created, remediation starts immediately. If a change ticket is created, the remediation proceess awaits approval of the change. 
      Based on the approval process configured in your BMC Remedy ITSM environment, the change is approved, and the Perform Remediation workflow is invoked, which starts the service on the target server. 
      The Perform Remediation workflow is specific to the event type and is included in the event type module. For example, if you create a new module for a new event type, the Perform Remediation workflow is invoked from the new module.
    6. After the remediation is complete, a Post-Remediation Actions workflow is invoked, which invokes the BMC-SA-Event_Orchestration_Service_Down:Perform Validation workflow. After the validation process is completed, the change, task, and incident status is updated and the tickets are marked as closed.
  6. The Service Down event is successfully managed by applying triage, performing remediation actions, which involves starting the service, while an incident, change and task are created to track the event management lifecycle. At each step, the Logs & Notes tab in the TSPS for the corresponding event is updated with the status of the event orchestration process.  

Where to go from here

For adding custom use cases to use the Event Orchestration run book, see Adding-a-new-use-case-to-the-Event-Orchestration-runbook

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*