Using the Triage and Remediation solution
Overview of workflows
The Triage and Remediation solution and BMC Event and Impact Management solution provide prepackaged workflows to help you manage the following system occurrences:
- Operating system disk space reaches or exceeds its capacity
- Monitored host system is down (triage only)
- VMware ESX server host fails to respond
- Utility workflows for starting or restarting servers and services (include a validation phase)
- Database reaches or exceeds a specified tablespace limit
- Scheduled backup for a specified IBM Tivoli Storage Manager (TSM) server has failed
- PATROL Agent goes down due to any errors
An event notification triggers each workflow. The source of this event can be a BMC PATROL Agent, a BMC Event and Impact Management event, or any event that complies with the slot mapping standards that BMC Atrium Orchestrator supports. The event, in turn, is triggered when a specified threshold attribute is breached for the particular monitored component.
Defining a remote action policy to launch a workflow
When defining a policy, you first define the event selector and then define the policy from the administrator console.
Perform the following steps to define the event selection criteria:
- In the Tree view of the administrator console, open the By Selector folder and highlight the selector you added to the remote action policy to open the Selector panel.
- Highlight this selector in the selector list of the Selector panel.
- Click the Update Event Selector icon in the toolbar to enable the edit function.
- In the Event Selector Criteria list of the Selector panel, highlight the selector and click Edit to open the Edit Criteria dialog box.
- In the Edit Criteria dialog box, specify the slots and values for events that you want the selector to match.
For example, you can specify the matching criteria in the event message slot, such as $EV.msg contains 'unreachable'. - Click OK.
Perform the following steps to define the remote action policy:
- In the administrator console, click the Event Management Policies tab.
- In the tree view under My Production, open the server cell entry.
- Under the Policy Type folder, select Remote Action Policy.
- Click the Add Event Policy icon in the tool bar.
- In the Selector Chooser dialog box, choose the selector to which this policy and the designated workflow apply. Then, click OK.
- In the Remote Action Policy tab, enter the policy name (required) and a description (optional).
- Designate whether the timeframes are enabled.
If enabled, indicate whether policy activation timeframes are always active (default value), or select the option to define the schedule of your timeframes. - In the Action Name list, select the automatic workflow action to apply to this policy.
List of automatic workflow actions - Click OK.
The event selection criteria are applied to the remote action policy.
To check the results of the workflow action
Click the Action Results icon to display the Event Remote Action Results dialog box. You can view output, errors, and details associated with the workflow action. The exit code 0 indicates successful execution. Otherwise, the exit code defaults to -1.
To view related events
Click the Related Events icon.
Related Events option
The Event List window displays where you can view, filter, and perform actions on consequent events that are related to the workflow launch.
Event List window
Verifying that the Infrastructure Management Server and the Atrium Orchestrator server can communicate
On the Infrastructure Management Server side, when you run an Atrium Orchestrator workflow on an event, the system populates the mc_operations field with a message. If you do not see a message in the Details window of the Console, check the Infrastructure Management Server cell trace file under the installationDirectory\pw\server\log\cellName\trace directory. When the workflow runs, it updates the mc_notes field in the Details window with status information and displays an Action Result icon next to the event. An action exit code of 0 indicates a successful execution. A non zero action exit code indicates unsuccessful execution.
To obtain more debugging information about the Infrastructure Management Server cell, run the following command:
On the Atrium Orchestrator server side, check the grid.log file to verify whether the workflow action has been received in the grid.log under the C:\Program Files\BMC\AO\tomcat\logs directory path.