Resolving problems using Triage and Remediation workflows
The Triage and Remediation for BMC ProactiveNet and BMC Event and Impact Management solution provides prepackaged workflows to help you manage certain system occurrences.
Overview of workflows
The following table lists possible system occurrences, the name of the workflow, and a link to the appropriate section in the BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide for more information on the workflow.
All Triage and Remediation workflows are derived from a common BMC Atrium Orchestrator framework. For information on the framework, see "Triage and Remediation framework" and "Framework processing" in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.
Problem | Related workflow | For more information |
---|---|---|
Operating system disk space reaches or exceeds its capacity | OS Disk Space Full | The OS Disk Space Full workflow is a triage and remediation workflow. The workflow is initiated by an incoming event from an identified host, the identifier being a key to the event definition and resultant workflow actions. The workflow analyzes specific slots in the event definition, slots that are mapped to XML elements in the Event_Management module for the identified host. |
Monitored host system is down (triage only) | Host Down | The Host Down workflow is a triage-only workflow. It makes use of a series of ping and traceroute commands to verify whether the host specified in the event is actually down. The workflow does not attempt to restart the host. |
VMware ESX server host fails to respond | ESX Host Not Responding | This triage and remediation workflow is designed to diagnose and remediate the state of a VMware ESX server and its connection with the vCenter server. |
Utility workflows for starting or restarting servers and services (include a validation phase) | Server Start | The Server Start workflow attempts to start the server through the Wake_on_Lan command. As a utility workflow it does not create an incident. The Server Start workflow can apply to virtual machines and VMware ESX server hosts provided they support the Wake_on_Lan command. See “Workflow: Server Start” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide. |
Database reaches or exceeds a specified tablespace limit | DB Tablespace Full | The DB Tablespace Full workflow is a triage and remediation workflow that is triggered by a “database tablespace full” event from BMC ProactiveNet. This workflow supports PATROL, Abnormality, and Alarm event types. |
Scheduled backup for a specified IBM Tivoli Storage Manager (TSM) server has | Failed Backup and Recovery | The Failed Backup and Recovery workflow is a triage and remediation workflow that is triggered by a BMC PATROL event indicating that a scheduled backup job specified on an IBM Tivoli Storage Manager (TSM) server has failed. The Failed Backup and Recovery workflow logs into the specified server, checks the error log for messages indicating backup failure, and then schedules a restart of the backup job. |
How the workflows get executed
An event notification triggers each workflow. The source of this event can be a BMC PATROL, a BMC Event and Impact Management, or any event that complies with the slot mapping standards which BMC Atrium Orchestrator supports. The event, in turn, is triggered when a specified threshold attribute is breached for the particular monitored component.
You can also manually launch the workflows from the Events Console of the BMC ProactiveNet Operations Console. You can also define a remote action policy to automate the workflow launch from the BMC ProactiveNet Administration Console.
The parameters for each workflow are defined in the ao_actions.mrl file. The chief difference between the two is that you can specify your input parameters in manual workflows, but the automated workflows use default values. For example, an automated workflow automatically creates incidents and change requests. However, you do have the option in automated workflows of changing the default values in the %MCELL_HOME%\etc\cellName\kb\bin\ao_actions.mrl file. (See “Configure BMC ProActiveNet for the Triage and Remediation solution” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide for the steps to recompile the knowledge base and restart the ProactiveNet server cell.)
What to do next
The following topics provide additional information for working with Triage and Remediation for BMC ProactiveNet and BMC Event and Impact Management solution workflows:
- Defining-a-remote-action-policy-to-launch-a-workflow
- Launching-a-workflow-on-demand-from-the-Events-View
- Verifying BMC ProactiveNet server and BMC Atrium Orchestrator communication
You can also create your own triage and remediation workflows using BMC Atrium Orchestrator. For instructions and guidelines, see “Guidelines for customizing the workflows” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide, which also includes a scenario for adding a new workflow.