Workflow- Agent Restart

Supported versions

The Atrium Orchestrator integration is available in version 11.0.00.02 and later.

At a minimum, this integration requires BMC Atrium Orchestrator Platform 7.9 to be installed. Also, BMC Triage and Remediation Solution run book version 20.17.01 or earlier must be installed. This run book is deprecated from TrueSight Orchestration Content 20.18.01 onwards.

For more information, see the following links:

  • Installing TrueSight Orchestration Platform
  • Installing the BMC Atrium Orchestrator Content

 

When the PATROL Agent stops responding, this triage and remediation workflow fixes the corrupt history,  backs up of the configuration file, and restarts the PATROL Agent.

 Agent Restart configuration

The module configuration for BMC Atrium Orchestrator content that is received from the Grid Manager contains the module-specific properties. This information is used to create an instance of a module, and also when servicing a request. 

Agent_Restart group configuration items

Configuration ItemDescription
AO_HostThe IP address or host name of the BMC Atrium Orchestrator computer
WF_Detailed_Logging_Flag

Enables detailed logging of this specific workflow in the operator console. The following are the valid values:

  • Use default
  • True
  • False

The use default value applies the true or false value specified in the Detailed Logging File item under the Runbook_Defaults configuration folder that applies to all workflows. You can override this value by specifying its opposite value in the WF_Detailed_Logging_Flag item.

Validation_Pause_Count_MinutesPause time before which the status of the event is checked before completing the task.
Triage -> PatrolAgent -> PatrolAgent_Windows_Logpath 
Triage -> PatrolAgent -> Linux_LogpathThe path in which the PATROL Agent error log is located on a Linux operating system.
Triage -> PatrolAgent -> SunOS_LogpathThe path in which the PATROL Agent error log is located on a Solaris machine.
Triage -> PatrolAgent -> AIX_LogpathThe path in which the PATROL Agent error log is located on an AIX machine.
Remediation -> PatrolAgent -> PatrolAgent_Remediation_TasksAn XMLfile with the list of remediation tasks to be performed for Patrol Agent if change creation is not enabled. <tasks_mapping> <task name="Remediate History File corruption">Run_Fix_Hist</task> <task name="Remediate Configuration File corruption">Backup_Config_File</task> <task name="Restart the Patrol Agent service">Start_Agent</task> </tasks_mapping>
Remediation -> PatrolAgent -> Run_Fix_Hist -> Windows_DefaultCommand to fix PATROL Agent history file on a Windows machine.
Remediation -> PatrolAgent->Run_Fix_Hist -> AIXCommand to fix PATROL agent history file on an AIX machine. (fix_hist utility)
Remediation ->PatrolAgent-> Run_Fix_Hist -> LinuxCommand to fix PATROL agent history file on a Linux machine. (fix_hist utility)
Remediation ->PatrolAgent-> Run_Fix_Hist -> SunOSCommand to fix PATROL agent history file on a Solaris machine. (fix_hist utility)
Remediation -> PatrolAgent->Backup_Config_File ->Backup_Config_File_FlagFlag which indicates whether the PATROL Agent configuration file must be backed up if the file is corrupt. (True or False)
Remediation -> PatrolAgent->Backup_Config_File -> Windows_Config_Path 
Remediation -> PatrolAgent->Backup_Config_File -> Linux_Config_PathPath in which the PATROL Agent configuration files are located on a Linux machine. /opt/bmc/PatrolAgent/Patrol3/config/
Remediation -> PatrolAgent->Backup_Config_File -> SunOS_Config_PathPath in which the PATROL Agent configuration files are located on a Solaris machine. /opt/bmc/PatrolAgent/Patrol3/config/
Remediation -> PatrolAgent->Backup_Config_File -> AIX_Config_PathPath in which the PATROL Agent configuration files are located on an AIX machine. /opt/bmc/PatrolAgent/Patrol3/config/
Remediation -> PatrolAgent-> Start_Agent -> Windows_DefaultThe command to start the PATROL Agent on a Windows machine. PatrolAgent -p
Remediation -> PatrolAgent-> Start_Agent -> LinuxThe command to start the PATROL Agent on a Linux machine. /opt/bmc/PatrolAgent/Patrol3/PatrolAgent -p
Remediation -> PatrolAgent-> Start_Agent -> AIXThe command to start the PATROL agent on an AIX machine. /opt/bmc/PatrolAgent/Patrol3/PatrolAgent -p
Remediation -> PatrolAgent-> Start_Agent -> SunOSThe command to start the PATROL agent on a Solaris machine. /opt/bmc/PatrolAgent/Patrol3/PatrolAgent -p

You can find the action definitions for the Agent Restart workflow in the installationDirectory\pw\server\etc\cellName\kb\bin\ao_actions.mrl file. 


The following extract from the ao_actions.mrl file depicts the action definitions of this workflow for both the manual, on-demand launch and the automatic launch via a remote action policy:

#Entry for manual action
action 'Atrium Orchestrator Actions'.'Triage and Remediate Agent Restart':
{
['Administrator', 'Full Access','BPPM Administrators','Cloud Administrators', 'Data Collection Administrator','BPPM Monitoring Administrators', 'Event Administrator','BPPM Model Administrators', 'Event Operator','BPPM Operators', 'Data Collection Operator', 'Event Operator Supervisor','BPPM Supervisors', 'Data Collection Supervisor']
}
[
'Create Change Request':MC_TRUEFALSE($CREATECHANGERQUEST),
'Change Request Type':MC_CHANGEREQUESTTYPE($CHANGEREQUESTTYPE),
'Create/Update Incident' : MC_TRUEFALSE($CREATEINCIDENT),
'Remediate' : MC_TRUEFALSE($REMEDIATE)
]
:EVENT($EV) where [ $EV.status != 'CLOSED' AND $EV.status != 'BLACKOUT']
{
action_requestor($UID,$PWD);
opadd($EV, "Triage and Remediate Agent Restart manual", $UID);
admin_execute(BEMGW,$EV,"Atrium_Orchestrator_Agent_Restart_Workflow",[$CREATECHANGERQUEST, $CHANGEREQUESTTYPE, $CREATEINCIDENT, $REMEDIATE,$UID],YES);
}
END
#Entry for automatic action
action 'Atrium Orchestrator Actions-Automatic'.'Triage and Remediate Agent Restart':
{
[triageguide90:'Administrator', 'Full Access','BPPM Administrators','Cloud Administrators', 'Data Collection Administrator','BPPM Monitoring Administrators', 'Event Administrator','BPPM Model Administrators']
}
:EVENT($EV) where [: $EV.status != 'CLOSED' AND $EV.status != 'BLACKOUT']
{
opadd($EV, "Triage and Remediate Agent Restart Auto", "BMC Impact Manager");
admin_execute(BEMGW,$EV,"Atrium_Orchestrator_Agent_Restart_Workflow",["true","normal","true","true","BMC Impact Manager"],YES);
}
END

Note

This extract needs to be added to the ao_actions.mrl file to run Agent Restart Workflow against Infrastructure Management versions 8.5 and 8.6.

Triage processing

 Event Initiated Agent Restart workflow

When an Agent monitored by Infrastructure Management stops responding, an event is generated. Infrastructure Management is configured to invoke BMC Atrium Orchestrator workflows both manually and automatically with options to create incident and change. 

BMC Event Adapter receives the events from Infrastructure Management and the Process Incoming Event workflow is triggered. 

This workflow checks if the incoming event is for Agent Restart and directs the call to the Event Initiated Agent Restart workflow. 

The AO server host tries to ping the target host system where the Agent is running. If ping fails, the workflow updates the event notes, and if incident creation and updating is enabled, the incident is created or updated and the process exits. 

If the ping is successful, triage is performed on the Agent. The Triage Agent Restart workflow is called to perform triage. The event will be updated with the triage information. 

If change creation is enabled and the triage detects a failure in the Agent, change and task are created. After the change is approved, remediation is performed. If the change creation is not enabled, remediation is performed after triage is successful. In this case, change and task are not created. The event is updated with remediation data. 

The Remedy Monitor adapter is used to receive the alerts after the change is approved and tasks are assigned. 

After the tasks are assigned, the Process ITSM Task workflow is triggered. If the ITSM task is for Agent Restart, Remediate Agent Restart workflow remediates the Agent.

 Triage Agent Restart workflow

This workflow extracts the service name from the incoming event and checks whether the Agent on the target computer is still not responding, as reported in the event. 

If the Agent is not responding, appropriate log files are read to check for any errors in the log. The PATROL Agent logs are verified for configuration error messages. 

If configuration errors are found, then only the remediation task for making a back-up of the configuration file will be run. 

The result of the triage and the error messages are updated in the event and then returned to the Event Initiated Agent Restart workflow.

Remediation processing

 Remediate Agent Restart workflow

The Remediate Agent Restart workflow can be invoked either after the BMC ITSM task is approved or immediately after triage if change creation is not enabled. 

If the input contains a task ID, task details are retrieved from BMC ITSM and appropriate remediation action is performed. Tasks and events are updated with the remediation results. 

If the input does not contain a task ID, the service name is extracted from the input event, and remediation tasks and commands are extracted from the module configuration. After the remediation commands are run, the event is updated with remediation information. 

The following three remediation actions are performed in the given sequence. While running these tasks from BMC ITSM, when the change is approved:

  1. First, the running fix_hist utility is run to fix the corrupt history file, and then, it is closed.
  2. After the running fix_hist utility is closed, the configuration file is backed up and closed.
  3. Finally, the third task is run and the Agent is started. This task is closed only after the Infrastructure Management event on which the Agent Restart workflow is run gets closed.

Launching the workflow

Launch the Agent Restart workflow by selecting an appropriate Agent down event in the Events Console of the operator console. You can also define a remote action policy for the Agent down event to launch the Agent Restart workflow automatically. 

From the Events Console of the operator console, select the event, and choose the Tools > Remote Actions > Atrium Orchestrator Actions > Triage and Remediate Agent Restart workflow entry. Then, fill in the Execute Actions dialog box. You can refer to the following table to determine which input value to select for the Agent Restart workflow. 

Input parameterDescription
Create Change RequestBoolean. True/false indicator that shows whether you want to create a change request in BMC Remedy Change Management System. If you choose false, the Change Request Type parameter is ignored.
Change Request TypeString. Specifies the type of change request (normal/pre-approved)
Create/Update IncidentBoolean. True/false indicator that shows whether you want to create an incident in the incident management system. If an incident already exists, then workflow updates the existing incident information.
RemediateBoolean. True/false indicator that shows whether you want to proceed with the remediation action

From the administration console, define a remote action policy under the Event Management Policies tab, following the procedure in Creating remote actions on the administrator console. Complete the policy definition by choosing the appropriate automatic workflow action for Agent Restart.

Was this page helpful? Yes No Submitting... Thank you

Comments