Unsupported content This version of the documentation is no longer supported. However, the documentation is available for your convenience. You will not be able to leave comments.

Workflow- Agent Restart


The Agent Restart workflow is a triage and remediation workflow. When the PATROL Agent stops responding, this workflow fixes the corrupt history,  backs up of the configuration file, and restarts the PATROL Agent.

 Agent Restart configuration

The module configuration for BMC Atrium Orchestrator content that is received from the Grid Manager contains the module-specific properties. This information is used to create an instance of a module, and also when servicing a request.

Agent_Restart group configuration items

Configuration Item

Description

AO_Host

The IP address or host name of the BMC Atrium Orchestrator computer

WF_Detailed_Logging_Flag

Enables detailed logging of this specific workflow in the operator console. The following are the valid values:

  • Use default
  • True
  • False

The use default value applies the true or false value specified in the Detailed Logging File item under the Runbook_Defaults configuration folder that applies to all workflows. You can override this value by specifying its opposite value in the WF_Detailed_Logging_Flag item.

Validation_Pause_Count_Minutes

Pause time before which the status of the event is checked before completing the task.

Triage -> PatrolAgent -> PatrolAgent_Windows_Logpath

 

Triage -> PatrolAgent -> Linux_Logpath

The path in which the PATROL Agent error log is located on a Linux operating system.

Triage -> PatrolAgent -> SunOS_Logpath

The path in which the PATROL Agent error log is located on a Solaris machine.

Triage -> PatrolAgent -> AIX_Logpath

The path in which the PATROL Agent error log is located on an AIX machine.

Remediation -> PatrolAgent -> PatrolAgent_Remediation_Tasks

An XMLfile with the list of remediation tasks to be performed for Patrol Agent if change creation is not enabled. <tasks_mapping> <task name="Remediate History File corruption">Run_Fix_Hist</task> <task name="Remediate Configuration File corruption">Backup_Config_File</task> <task name="Restart the Patrol Agent service">Start_Agent</task> </tasks_mapping>

Remediation -> PatrolAgent -> Run_Fix_Hist -> Windows_Default

Command to fix PATROL Agent history file on a Windows machine.

Remediation -> PatrolAgent->Run_Fix_Hist -> AIX

Command to fix PATROL agent history file on an AIX machine. (fix_hist utility)

Remediation ->PatrolAgent-> Run_Fix_Hist -> Linux

Command to fix PATROL agent history file on a Linux machine. (fix_hist utility)

Remediation ->PatrolAgent-> Run_Fix_Hist -> SunOS

Command to fix PATROL agent history file on a Solaris machine. (fix_hist utility)

Remediation -> PatrolAgent->Backup_Config_File ->Backup_Config_File_Flag

Flag which indicates whether the PATROL Agent configuration file must be backed up if the file is corrupt. (True or False)

Remediation -> PatrolAgent->Backup_Config_File -> Windows_Config_Path

 

Remediation -> PatrolAgent->Backup_Config_File -> Linux_Config_Path

Path in which the PATROL Agent configuration files are located on a Linux machine. /opt/bmc/PatrolAgent/Patrol3/config/

Remediation -> PatrolAgent->Backup_Config_File -> SunOS_Config_Path

Path in which the PATROL Agent configuration files are located on a Solaris machine. /opt/bmc/PatrolAgent/Patrol3/config/

Remediation -> PatrolAgent->Backup_Config_File -> AIX_Config_Path

Path in which the PATROL Agent configuration files are located on an AIX machine. /opt/bmc/PatrolAgent/Patrol3/config/

Remediation -> PatrolAgent-> Start_Agent -> Windows_Default

The command to start the PATROL Agent on a Windows machine. PatrolAgent -p

Remediation -> PatrolAgent-> Start_Agent -> Linux

The command to start the PATROL Agent on a Linux machine. /opt/bmc/PatrolAgent/Patrol3/PatrolAgent -p

Remediation -> PatrolAgent-> Start_Agent -> AIX

The command to start the PATROL agent on an AIX machine. /opt/bmc/PatrolAgent/Patrol3/PatrolAgent -p

Remediation -> PatrolAgent-> Start_Agent -> SunOS

The command to start the PATROL agent on a Solaris machine. /opt/bmc/PatrolAgent/Patrol3/PatrolAgent -p

You can find the action definitions for the Agent Restart workflow in the installationDirectory\pw\server\etc\cellName\kb\bin\ao_actions.mrl file. 

The following extract from the ao_actions.mrl file depicts the action definitions of this workflow for both the manual, on-demand launch and the automatic launch via a remote action policy:

#Entry for manual action
action 'Atrium Orchestrator Actions'.'Triage and Remediate Agent Restart':
{
['Administrator', 'Full Access','BPPM Administrators','Cloud Administrators', 'Data Collection Administrator','BPPM Monitoring Administrators', 'Event Administrator','BPPM Model Administrators', 'Event Operator','BPPM Operators', 'Data Collection Operator', 'Event Operator Supervisor','BPPM Supervisors', 'Data Collection Supervisor']
}
[
'Create Change Request':MC_TRUEFALSE($CREATECHANGERQUEST),
'Change Request Type':MC_CHANGEREQUESTTYPE($CHANGEREQUESTTYPE),
'Create/Update Incident' : MC_TRUEFALSE($CREATEINCIDENT),
'Remediate' : MC_TRUEFALSE($REMEDIATE)
]
:EVENT($EV) where [ $EV.status != 'CLOSED' AND $EV.status != 'BLACKOUT']
{
action_requestor($UID,$PWD);
opadd($EV, "Triage and Remediate Agent Restart manual", $UID);
admin_execute(BEMGW,$EV,"Atrium_Orchestrator_Agent_Restart_Workflow",[$CREATECHANGERQUEST, $CHANGEREQUESTTYPE, $CREATEINCIDENT, $REMEDIATE,$UID],YES);
}
END
#Entry for automatic action
action 'Atrium Orchestrator Actions-Automatic'.'Triage and Remediate Agent Restart':
{
[triageguide90:'Administrator', 'Full Access','BPPM Administrators','Cloud Administrators', 'Data Collection Administrator','BPPM Monitoring Administrators', 'Event Administrator','BPPM Model Administrators']
}
:EVENT($EV) where [: $EV.status != 'CLOSED' AND $EV.status != 'BLACKOUT']
{
opadd($EV, "Triage and Remediate Agent Restart Auto", "BMC Impact Manager");
admin_execute(BEMGW,$EV,"Atrium_Orchestrator_Agent_Restart_Workflow",["true","normal","true","true","BMC Impact Manager"],YES);
}
END

Note

This extract needs to be added to the ao_actions.mrl file to run Agent Restart Workflow against Infrastructure Management versions 8.5 and 8.6.

Triage processing

 Event Initiated Agent Restart workflow

When an Agent monitored by Infrastructure Management stops responding, an event is generated. Infrastructure Management is configured to invoke BMC Atrium Orchestrator workflows both manually and automatically with options to create incident and change.

BMC Event Adapter receives the events from Infrastructure Management and the Process Incoming Event workflow is triggered.

This workflow checks if the incoming event is for Agent Restart and directs the call to the Event Initiated Agent Restart workflow.

The AO server host tries to ping the target host system where the Agent is running. If ping fails, the workflow updates the event notes, and if incident creation and updating is enabled, the incident is created or updated and the process exits. 

If the ping is successful, triage is performed on the Agent. The Triage Agent Restart workflow is called to perform triage. The event will be updated with the triage information.

If change creation is enabled and the triage detects a failure in the Agent, change and task are created. After the change is approved, remediation is performed. If the change creation is not enabled, remediation is performed after triage is successful. In this case, change and task are not created. The event is updated with remediation data.

The Remedy Monitor adapter is used to receive the alerts after the change is approved and tasks are assigned.

After the tasks are assigned, the Process ITSM Task workflow is triggered. If the ITSM task is for Agent Restart, Remediate Agent Restart workflow remediates the Agent.

 Triage Agent Restart workflow

This workflow extracts the service name from the incoming event and checks whether the Agent on the target computer is still not responding, as reported in the event.

If the Agent is not responding, appropriate log files are read to check for any errors in the log. The PATROL Agent logs are verified for configuration error messages.

If configuration errors are found, then only the remediation task for making a back-up of the configuration file will be run.

The result of the triage and the error messages are updated in the event and then returned to the Event Initiated Agent Restart workflow.

Remediation processing

 Remediate Agent Restart workflow

The Remediate Agent Restart workflow can be invoked either after the BMC ITSM task is approved or immediately after triage if change creation is not enabled.

If the input contains a task ID, task details are retrieved from BMC ITSM and appropriate remediation action is performed. Tasks and events are updated with the remediation results.

If the input does not contain a task ID, the service name is extracted from the input event, and remediation tasks and commands are extracted from the module configuration. After the remediation commands are run, the event is updated with remediation information.

The following three remediation actions are performed in the given sequence. While running these tasks from BMC ITSM, when the change is approved:

  1. First, the running fix_hist utility is run to fix the corrupt history file, and then, it is closed.
  2. After the running fix_hist utility is closed, the configuration file is backed up and closed.
  3. Finally, the third task is run and the Agent is started. This task is closed only after the Infrastructure Management event on which the Agent Restart workflow is run gets closed.

Launching the workflow

Launch the Agent Restart workflow by selecting an appropriate Agent down event in the Events Console of the operator console. You can also define a remote action policy for the Agent down event to launch the Agent Restart workflow automatically.

From the Events Console of the operator console, select the event, and choose the Tools > Remote Actions > Atrium Orchestrator Actions > Triage and RemediateAgent Restart workflow entry. Then, fill in the Execute Actions dialog box. You can refer to the following table to determine which input value to select for the Agent Restart workflow. 

Input parameter

Description

Create Change Request

Boolean. True/false indicator that shows whether you want to create a change request in BMC Remedy Change Management System. If you choose false, the Change Request Type parameter is ignored.

Change Request Type

String. Specifies the type of change request (normal/pre-approved)

Create/Update Incident

Boolean. True/false indicator that shows whether you want to create an incident in the incident management system. If an incident already exists, then workflow updates the existing incident information.

Remediate

Boolean. True/false indicator that shows whether you want to proceed with the remediation action

From the administration console, define a remote action policy under the Event Management Policies tab, following the procedure in Creating-remote-actions. Complete the policy definition by choosing the appropriate automatic workflow action for Agent Restart.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*