Triage and Remediation
The Triage and Remediation runbook solution automates the process for applying triage and remediation actions on high-volume events in the data center. In addition, the solution leverages BMC Atrium Orchestrator to link the BMC Remedy ITSM change and incident management applications with the Infrastructure Management's event processing capability. A trigger event sent from Infrastructure Management Performance Management is processed by a common BMC Atrium Orchestrator framework. The framework performs the following common processes:
- Extracts event information
- Extracts host connection details
- Enriches the event with PATROL annotation and configuration item data from the BMC Atrium Configuration Management Database (CMDB)
- Creates or updates incident, change, and task information in the BMC Remedy ITSM system if the workflow is run with the ITSM case set equal to true
The Triage and Remediation workflow initiates the triage and remediation actions that are unique to the workflow type, which is defined by the trigger event. The event is updated in the operator console with the incident, change, and task IDs, together with the status of the triage and remediation actions.
By using the standardized Triage and Remediation workflows, you can reduce event and incident management costs in the data center.
In the data center, regularly recurring, high-volume events can overwhelm the IT staff with repetitive troubleshooting and repair tasks. Two examples of such high-volume events are as follows:
- The disk space of a monitored host exceeds capacity
- A monitored host stops responding to requests
In their specific applications, the prepackaged BMC Atrium Orchestrator workflows run the triage and remediation actions in the following situation:
- Operating system disk space reaches or exceeds its capacity.
- A monitored host system is down (triage only)
- VMware ESX server host fails to respond
- Specified servers and services fail to respond
- Oracle database reaches or exceeds a specified tablespace limit
- Scheduled backup for a specified IBM Tivoli Storage Manager (TSM) server has failed
- PATROL Agent goes down due to any errors
These workflows are triggered by events from Infrastructure Management. They are triggered either manually in context from a console or automatically by predefined remote action policies.
By using the prepackaged BMC Atrium Orchestrator Triage and Remediation workflows, you can significantly reduce, if not eliminate, the effort and time required by your IT staff to address issues that these high-volume events cause.
The Triage and Remediation framework
The Triage and Remediation Solution is distinguished by its common framework, which consists of reusable subworkflows that can apply to any event type and action.
Event processing and workflow actions
The common workflows apply to either implementation, but not identically.
The framework can support different workflows, each with its specific triage and remediation processes. The event, incident, and change processing for these workflows are managed through the framework.
Framework and workflow-specific processes
From a development perspective, the framework provides a "plug-and-play" capability to workflow creation. You need to implement the specifics of your workflow, but the framework provides the foundation for countless variations.
You can customize the prepackaged workflows, and you can extend the workflow functionality by adding workflows specific to your environment. See Guidelines for customizing the workflows for a summary description that suggests the possibilities for customizing and extending workflows.