Workflow- Service Restart

This triage, remediation, and validation utility workflow restarts a service, such as Oracle, Telnet, or BMC Atrium Orchestrator and can be launched from any event except for a closed or blackout event. As a utility workflow, it does not create an incident.

Configuration guidelines

  • To enable this workflow, you must configure the AutoPilot Credentials Store and the Service_Restart configuration module on the BMC Atrium Orchestrator server side.
  • Ensure that the host systems on which the services reside are added to the AutoPilot Credentials Store. 

The Service_Restart configuration module contains the following definitions: 

Group/ItemDescription
Service_Actions_Aliasing

Aliases that point to the different services which the Service_Restart workflow addresses

You can provide simple aliases for service names.

This group also contains special commands or options for stopping, starting, and restarting services.

If an alias is not defined, BMC Atrium Orchestrator tries to stop, start, or restart the service name sent by Infrastructure Management with default commands.

For information about how to define domains and host names in the AutoPilot Credentials Store, see AutoPilot Credentials Store. The service stop/start commands vary according to the type of service and the operating system of the designated host system. Some examples for different services are provided below: 

Service: BMC Atrium Orchestrator 

net start "RBA-CDP" 
net stop "RBA-CDP" 
Restart c:/restart CDP.bat 
Validation_String RBA-CDP 

/opt/bmc/ao/cdp/bin/server.sh start
/opt/bmc/ao/cdp/bin/server.sh stop 
Validation_String bao 

Service: Oracle 

net start oracle 
net stop oracle 
Validation_String oracle 
/opt/oracle/bin/ora start 
ps -eaf/grep oracle/grep -v grep/awk 'print $2'/ 
xargs kill -9

AO_HostBMC AO host where the Configuration Distribution Peer (CDP) server resides. The service stop/start commands are launched from the AO host system.
WF_Detailed_Logging_FlagEnables detailed logging of this specific workflow in the Infrastructure Management Performance Manager Operations Console. The valid values are use defaulttrue, and false. The use default value applies the true or falsevalue specified in the Detailed Logging File item under the Runbook_Defaults configuration folder that applies to all workflows. You can override this value by specifying its opposite value in the WF_Detailed_Logging_Flag item.
Validation_Pause_Count_MinutesTime in minutes before the validation process begins to verify that the service has restarted or started. 

Because the restart and start processes take time, it is necessary to have this delay before the validation process begins. This is a variable time that is dependent on the type of server, the operating system, network configuration, traffic, and so forth. 

The default value for service restart is 5 minutes.

The Service Restart workflow uses the validation string to retrieve the process ID of the service during the triage phase. The workflow then compares the process ID retrieved during the validation phase with the process ID retrieved during triage to determine if remediation is successful. 


In the BMC TrueSight Infrastructure Management Server, additional enumeration values have been added to the ao_actions.baroc file to accommodate the different actions of the Service Restart workflow:

ENUMERATION MC_SERVICEACTIONS
0 restart
1 start
2 stop
END



You can find the action definitions for the Service Restart workflow in the installationDirectory\pw\server\etc\ cellName\kb\bin\ao_actions.mrl file.

An extract from the ao_actions.mrl file depicts the action definitions of this workflow for the manual, on-demand launch.

#Service Restart Utility Workflow
action 'Atrium Orchestrator Actions'.'Utility - Service Restart':
{ 
['Administrator', 'Full Access', 'Data Collection Administrator', 'Event Administrator', 'Event Operator', 'Data Collection Operator', 'Event Operator Supervisor', 'Data Collection Supervisor'] 
} 
[
'Create Change Request':MC_TRUEFALSE($CREATECHANGERQUEST),
'Change Request Type':MC_CHANGEREQUESTTYPE($CHANGEREQUESTTYPE),
'Server Name' : STRING($HOST),
'Service Name' : STRING($SERVICE),
'Service Action' : MC_SERVICEACTIONS($SERVICE_ACTION)
]
:EVENT($EV) where [ $EV.status != 'CLOSED' AND $EV.status 
	!= 'BLACKOUT']
{ 
action_requestor($UID,$PWD); 
opadd($EV, "Triage and Remediate Service Restart", $UID); admin_execute(BEMGW,$EV,"Atrium_Orchestrator_Service_Restart_
	Workflow",[$CREATECHANGERQUEST, $CHANGEREQUESTTYPE,"false","true",$UID,$HOST,$SERVICE,$SERVICE_
	ACTION],YES); 
} 
END 

Launching the workflow

Launch a utility workflow from any event in Infrastructure Management except for blackout and closed state events. You can launch this workflow only as a manual, on-demand workflow from the Operations Console.

From the event view of the Operations Console, select the event, and choose the Tools > Remote Actions/Diagnostics > Atrium Orchestrator Actions > Utility - Service Restartworkflow entry. Then, complete the Execute Actions dialog box. See the following table to determine which input value to select for the Service Restart workflow.

Input parameterDescription
Create Change RequestBoolean. True/false indicator that shows whether you want to create a change request in BMC Remedy Change Management System. If you choose false, the Change Request Type parameter is ignored.
Change Request TypeString. Specifies the type of change request (normal/preapproved)
Server NameFully qualified host name or IP address of the target server If you do not specify a server name, then the mc_host value of the event is used to populate this field.
Service NameName of the service that you are accessing
Service ActionThe type of action to perform on the service. You can choose, start, stop, or restart from the list. The default value is restart.

Common framework: event processing

You can launch the Utility Service Restart workflow by selecting any event other than a blackout or closed event and then choosing the corresponding Atrium Orchestrator Action. The target server host is represented by the mc_host slot in the event mapping table that the workflow interprets.

After extracting the configuration data from the event, the common framework determines the logging level for the workflow, sets the level at normal or detailed, and updates the event information in the Notes dialog box of the Operations Console accordingly.

The workflow next extracts the service aliases by which it connects with the specified service.

Remediation processing

The Utility Service Restart triage begins with a series of ping attempts launched from the BMC AO host system to the target host where the service resides. If the ping is successful, indicating that the host system is up, the workflow next checks whether the service name is available. If the ping is unsuccessful, the workflow exits and updates the event note.

If the service name is available, the workflow extracts the configuration information from the host system. The workflow next invokes the validation string against the service to determine the status of the service. If the validation is successful, the workflow retrieves the process ID (PID) of the service and determines which operation to perform for remediation.

During remediation, the workflow again extracts service aliases for the service restart action. It checks for which action (start, stop, or restart) to use by verifying the process ID of the service. If the process ID is not available, meaning that the service is not available, the workflow checks whether it has the appropriate command available for the required action – start, stop, or restart.

If the appropriate command is available, the workflow processes the command, updates the remediation status, and updates the event information.

If the command is unavailable, the workflow tries to start or stop the service by using default OS commands, such as net start|stop $ SERVICE on Windows or /etc/init.d/$SERVICE start|stop on UNIX systems. The workflow then proceeds to update the remediation status and the event information.

Validation processing

If the remediation is successful, the workflow continues to the validation phase. The workflow again extracts the configuration data and updates the event notes. The workflow then goes into a pause state, based on the value assigned to the Validation_Pause_Count_Minutes item in the configuration module. The workflow is waiting for the service restart process to complete.

After the pause count expires, the workflow invokes the validation command. If the validation command is successful for a stop action, the workflow updates the event notes accordingly. If the validation command is successful for a start or restart action, the workflow extracts the process ID of the service and compares it with the process ID extracted earlier during the triage phase. If the process ID has changed, then the start or restart operation is successful. The workflow updates the event notes accordingly.

Was this page helpful? Yes No Submitting... Thank you

Comments