Resolving problems using Triage and Remediation workflows


The Triage and Remediation for BMC ProactiveNet and BMC Event and Impact Management solution provides prepackaged workflows to help you manage certain system occurrences.

Overview of workflows

The following table lists possible system occurrences, the name of the workflow, and a link to the appropriate section in the BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide for more information on the workflow. 

All Triage and Remediation workflows are derived from a common BMC Atrium Orchestrator framework. For information on the framework, see "Triage and Remediation framework" and "Framework processing" in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.

Problem

Related workflow

For more information

Operating system disk space reaches or exceeds its capacity

OS Disk Space Full

The OS Disk Space Full workflow is a triage and remediation workflow. The workflow is initiated by an incoming event from an identified host, the identifier being a key to the event definition and resultant workflow actions. The workflow analyzes specific slots in the event definition, slots that are mapped to XML elements in the Event_Management module for the identified host.

See “Workflow description: OS Disk Space Full” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.

Monitored host system is down (triage only)

Host Down

The Host Down workflow is a triage-only workflow. It makes use of a series of ping and traceroute commands to verify whether the host specified in the event is actually down. The workflow does not attempt to restart the host.

See “Workflow description: Host Down” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.

VMware ESX server host fails to respond

ESX Host Not Responding

This triage and remediation workflow is designed to diagnose and remediate the state of a VMware ESX server and its connection with the vCenter server.

See “Workflow: ESX Host Not Responding” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.

Utility workflows for starting or restarting servers and services (include a validation phase)

Server Start
Server Restart
Service Restart

The Server Start workflow attempts to start the server through the Wake_on_Lan command. As a utility workflow it does not create an incident. The Server Start workflow can apply to virtual machines and VMware ESX server hosts provided they support the Wake_on_Lan command.  See “Workflow: Server Start” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.
The Server Restart workflow is a triage, remediation, and validation utility workflow that is designed to restart a server host system.  See “Workflow: Server Restart” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.
Like the Server Restart workflow, the Service Restart workflow is a triage, remediation, and validation utility workflow that is designed to restart a service, such as Oracle, Telnet, or BMC Atrium Orchestrator. As a utility workflow it does not create an incident. See “Workflow: Service Start” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.

Database reaches or exceeds a specified tablespace limit

DB Tablespace Full

The DB Tablespace Full workflow is a triage and remediation workflow that is triggered by a “database tablespace full” event from BMC ProactiveNet. This workflow supports PATROL, Abnormality, and Alarm event types.

See “Workflow: DB Tablespace Full” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.

Scheduled backup for a specified IBM Tivoli Storage Manager (TSM) server has
Failed

Failed Backup and Recovery

The Failed Backup and Recovery workflow is a triage and remediation workflow that is triggered by a BMC PATROL event indicating that a scheduled backup job specified on an IBM Tivoli Storage Manager (TSM) server has failed. The Failed Backup and Recovery workflow logs into the specified server, checks the error log for messages indicating backup failure, and then schedules a restart of the backup job.

See “Workflow: Failed Backup and Recovery” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide.


How the workflows get executed

An event notification triggers each workflow. The source of this event can be a BMC PATROL, a BMC Event and Impact Management, or any event that complies with the slot mapping standards which BMC Atrium Orchestrator supports. The event, in turn, is triggered when a specified threshold attribute is breached for the particular monitored component.

You can also manually launch the workflows from the Events Console of the BMC ProactiveNet Operations Console. You can also define a remote action policy to automate the workflow launch from the BMC ProactiveNet Administration Console.

The parameters for each workflow are defined in the ao_actions.mrl file. The chief difference between the two is that you can specify your input parameters in manual workflows, but the automated workflows use default values. For example, an automated workflow automatically creates incidents and change requests. However, you do have the option in automated workflows of changing the default values in the %MCELL_HOME%\etc\cellName\kb\bin\ao_actions.mrl file. (See “Configure BMC ProActiveNet for the Triage and Remediation solution” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide for the steps to recompile the knowledge base and restart the ProactiveNet server cell.)

What to do next

The following topics provide additional information for working with Triage and Remediation for BMC ProactiveNet and BMC Event and Impact Management solution workflows:

You can also create your own triage and remediation workflows using BMC Atrium Orchestrator. For instructions and guidelines, see “Guidelines for customizing the workflows” in BMC ProactiveNet Performance Management Triage and Remediation Solution Getting Started Guide, which also includes a scenario for adding a new workflow.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*