Understanding Triage and Remediation
This topic provides an overview of the BMC ProactiveNet Performance Management Triage and Remediation Solution and the products involved.
BMC assumes that you are knowledgeable about and experienced with the products that constitute the BMC ProactiveNet Performance Management Triage and Remediation Solution.
The Triage and Remediation runbook solution automates the process for triaging or remediating high-volume events in the data center. In addition, the solution leverages BMC Atrium Orchestrator to link the BMC Remedy ITSM change and incident management applications with the BMC ProactiveNet Performance Management event processing capability.
A trigger event sent from BMC ProactiveNet Performance Management is processed by a common BMC Atrium Orchestrator framework. The framework performs the following common processes:
- Extracts event information
- Extracts host connection details
- Enriches the event with PATROL annotation and configuration item data from the BMC Atrium Configuration Management Database (CMDB)
- Creates or updates incident, change, and task information in the BMC Remedy ITSM system if the workflow is run with the ITSM case set equal to true
The Triage and Remediation workflow initiates the triage and remediation actions that are unique to the workflow type, which is defined by the trigger event. The event is updated in the BMC ProactiveNet Operations Console with the incident, change, and task IDs, together with the status of the triage and remediation actions.
By using the standardized Triage and Remediation workflows, you can reduce event and incident management costs in the data center.
Workflows
In the data center, regularly recurring, high-volume events can overwhelm the IT staff with repetitive troubleshooting and repair tasks. Two examples of such high-volume events are as follows:
- The disk space of a monitored host exceeds capacity
- A monitored host stops responding to requests
In their specific applications, the prepackaged BMC Atrium Orchestrator workflows run the triage and remediation actions in the following situation:
- Operating system disk space reaches or exceeds its capacity.
- A monitored host system is down (triage only)
- VMware ESX server host fails to respond
- Specified servers and services fail to respond
- Oracle database reaches or exceeds a specified tablespace limit
- Scheduled backup for a specified IBM Tivoli Storage Manager (TSM) server has failed
- PATROL Agent goes down due to any errors
These workflows are triggered by events from BMC ProactiveNet. They are triggered either manually in context from a console or automatically by predefined remote action policies.
By using the prepackaged BMC Atrium Orchestrator Triage and Remediation workflows, you can significantly reduce, if not eliminate, the effort and time required by your IT staff to address issues that these high-volume events cause.