Using Triage and Remediation Solution
This topic provides suggested uses for the solution, descriptions of the workflows, configuration tips, and guidelines for customizing the existing workflows.
Overview of workflows
The Triage and Remediation Solution and BMC Event and Impact Management solution provides prepackaged workflows to help you manage the following system occurrences:
- Operating system disk space reaches or exceeds its capacity
- Monitored host system is down (triage only)
- VMware ESX server host fails to respond
- Utility workflows for starting or restarting servers and services (include a validation phase)
- Database reaches or exceeds a specified tablespace limit
- Scheduled backup for a specified IBM Tivoli Storage Manager (TSM) server has failed
- PATROL Agent goes down due to any errors
An event notification triggers each workflow. The source of this event can be a BMC PATROL, a BMC Event and Impact Management, or any event that complies with the slot mapping standards which BMC Atrium Orchestrator supports. The event, in turn, is triggered when a specified threshold attribute is breached for the particular monitored component.
How to launch workflows from Infrastructure Management
You can launch a manual, on-demand workflow from the Events Console of the operator console. You can also define a remote action policy to automate the workflow launch from the administration console.
The parameters for each workflow are defined in the ao_actions.mrl file. The chief difference between the two is that you can specify your input parameters in manual workflows, but the automated workflows use default values. For example, an automated workflow automatically creates incidents and change requests. However, you do have the option in automated workflows of changing the default values in the installationDirectory\pw\server\etc\cellName\kb\bin\ ao_actions.mrl file. (See Configuring Infrastructure Management for the Triage and Remediation Solution for the steps to recompile the knowledge base and restart the server cell.)
You have the option of making most of the workflows context sensitive to PATROL events by uncommenting and commenting code in the ao_actions.mrl file. (See ProactiveNet server configuration for examples.)
Defining a remote action policy to launch a workflow
In defining a policy, you first define the event selector and then define the policy from the administration console.
Perform the following steps to define the event selection criteria:
- In the Tree view, open the By Selector folder and highlight the selector you added to the remote action policy to open the Selector panel.
- Highlight this selector in the selector list of the Selector panel.
- Click the Update Event Selector icon in the toolbar to enable the edit function.
- In the Event Selector Criteria list of the Selector panel, highlight the selector and click Edit to open the Edit Criteria dialog box.
- In the Edit Criteria dialog box, specify the slots and values for events that you want the selector to match.
For example, you can specify the matching criteria in the event message slot, such as $EV.msg contains 'unreachable'. - Click OK.
Perform the following steps to define the remote action policy:
- In the Administration Console, click the Event Management Policies tab.
- In the tree view under My Production, open the server cell entry.
- Under the Policy Type folder, select Remote Action Policy.
- Click the Add Event Policy icon in the tool bar.
- In the Selector Chooser dialog box, choose the selector to which this policy and the designated workflow apply. Then, click OK.
- In the Remote Action Policy tab, enter the policy name (required) and a description (optional).
- Designate whether the timeframes are enabled. If enabled, indicate whether policy activation timeframes are always active (default value), or select the option to define the schedule of your timeframes.
- In the Action name list, select the automatic workflow action to apply to this policy.
List of automatic workflow actions - Click OK.
The event selection criteria are applied to the remote action policy.
Launching a workflow on demand from the Events Console
By default, each workflow applies to all events. From operator console, perform the following steps to select and launch a workflow from the Events Console:
- Identify the appropriate event for the workflow, and click the Tools icon to display the menu.
- Choose Remote Actions > Atrium Orchestrator Actions to display a list of workflows.
List of on-demand workflow actions Select the workflow. An Execute Action dialog box opens containing input parameters that are specific to your workflow selection. The following figure shows the Execute Action dialog box for the OS Disk Space Full workflow:
Execute Action dialog box: OS Disk Space Full workflow
For example, you can choose to create an incident ticket and a change request as part of the workflow.- Make your selections in the Execute Action dialog box, and click Execute to launch the workflow.
An Action Results icon and a Related Events icon are displayed in the event row. An information event is returned indicating the action and the target host. - Verify the event notes in the Details pane. The event notes describe the stages of the workflow's execution and indicate whether the workflow has been launched successfully. See the following figure:
Event notes in the Details pane
To check the results of the workflow action
Click the Action Results icon to display the Event Remote Action Results dialog box. You can view output, errors, and details associated with the workflow action. The exit code 0 indicates successful execution. Otherwise, the exit code defaults to -1.
To view related events
Click the Related Events icon.
Related Events option
The Event List window displays where you can view, filter, and perform actions on consequent events that are related to the workflow launch.
Event List window
Verifying that the BMC TrueSight Infrastructure Management Server and the BMC Atrium Orchestrator server can communicate
On the BMC TrueSight Infrastructure Management Server side, when you run a BMC Atrium Orchestrator workflow on an event, the system populates the mc_operations field with a message. If you do not see a message in the Details window of the Console, check the Infrastructure Management Server cell trace file under the installationDirectory\pw\server\log\cellName\trace directory. When the workflow runs, it updates the mc_notes field in the Details window with status information and displays an Action Result icon next to the event. An action exit code of 0 indicates a successful execution. A non zero action exit code indicates unsuccessful execution.
To obtain more debugging information about the BMC TrueSight Infrastructure Management Server cell, run the following command:
On the BMC Atrium Orchestrator server side, check the grid.log file to verify whether the workflow action has been received in the grid.log under the C:\Program Files\BMC\AO\tomcat\logs directory path.