Workflow- Host Down

Note

Atrium Orchestrator integration is available in version 11.0.00.02 and later.

This triage-only workflow makes use of a series of ping and traceroute commands to verify whether the host specified in the event is actually down. The workflow does not attempt to restart the host.

Host_Down Configuration

The Triage Host_Down configuration module requires that you enter the IP address, host name, or fully qualified host name of the following devices to make the 360 degree verification process work: 

Host down configuration parameters

DeviceDescription
Ping_Host_1Devices from which the ping command is triggered when a host down event is received. If the ping command fails, the ping host from where the ping failed launches a traceroute and collects the traceroute information. Check whether you can log on to the ping host system from the peer host system.
Ping_Host_2
Ping_Host_3
Ping_RouterDefault router that launches the ping command when the traceroute does not return any router IPs in the path to the host identified in the host down event
Router_pingCommandPing command that is appropriate for the routers in your environment
WF_Detailed_Logging_FlagEnables detailed logging of this specific workflow in the Infrastructure Management Performance Manager Operations Console. The valid values are use defaulttrue, and false. The use defaultvalue applies the true or false value specified in the Detailed Logging File item under the Runbook_Defaults configuration folder that applies to all workflows. You can override this value by specifying its opposite value in the WF_Detailed_Logging_Flag item.

The ping host and the ping router devices are considered as jump hosts, intermediate devices that are configured to provide a gateway into a secure SSH environment. These jump hosts must be on different networks. By using the jump hosts, you can access or try to access the target host (the suspected down host). These jump hosts accept and return pings. 


Define the host names, login credentials, invocation mechanisms, and other credential data about these devices under the proper domain in the Datacenter grouping of the AutoPilot Credentials Store. In addition, enter the credential data of the routers in the network topology that a traceroute command could retrieve in a list of router IPs.

Triage processing

The Host Down triage uses the 360-degree verification, pinging the target host from three different directions, launching a traceroute from the BMC Atrium Orchestrator CDP server, and pinging the target host from the closest router. 

The triage begins after the workflow, by using the credentials from the Credential Store and extracts the connection details for the devices, as shown in the following figure: 

Host down overview 

 

The Host Down triage begins with a triangular ping originating from three user-defined host systems contained in the Triage and Remediation configuration module. These three hosts ping the host that is possibly down (target host). If any one of the ping attempts fails, the host from where the ping failed launches a traceroute command to trace the network route that the ping command took before reaching its destination. See the following figure: 

Host down 360 service validation 

 

The workflow collects and examines the output of the traceroute command to retrieve the list of IP addresses discovered during the trace along with the IP address of the target host. The workflow sifts through this information to distinguish intermediate connectivity issues from a host down condition. 

In processing the traceroute information, the host down workflow first examines the list of traceroute IP addresses to see whether they include the IP address of the target or down host. If the target or down host IP is included in the list, the workflow selects the penultimate entry in the list as the closest router.

U:\>tracert engwin2k3aovm1.qa.bladelogic.com
Tracing route to engwin2k3aovm1.qa.bladelogic.com [10.20.88.236 (this is the target host)]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms rtr4.bmc.com [137.72.243.253]
2 <1 ms <1 ms <1 ms 137.72.252.1
3 1 ms 1 ms 1 ms rtr-lan-verzion-mpls-ind-pun.bmc.com [137.72.252
.118]
4 5 ms 4 ms 4 ms 199.220.49.9
5 204 ms 202 ms 204 ms 68.136.44.134
6 204 ms 205 ms 203 ms sw-core-usa-lex-01.bmc.com [10.20.10.2]
7 202 ms 204 ms 203 ms sw-rd-core-usa-lex-01-ge61.bmc.com [10.20.254.6] (the closest router)
8 203 ms 205 ms 203 ms 10.20.88.236 (target host)
Trace complete.


If the target host IP is not included, the workflow selects the last entry in the list of router IP addresses as the closest router.

U:\>tracert engwin2k3aovm2.qa.bladelogic.com
Tracing route to engwin2k3aovm2.qa.bladelogic.com [10.20.88.43 (this is the target host)] over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms rtr4.bmc.com [137.72.243.253]
2 <1 ms <1 ms <1 ms 137.72.252.1
3 <1 ms 1 ms 1 ms rtr-lan-verzion-mpls-ind-pun.bmc.com [137.72.252
.118]
4 4 ms 4 ms 7 ms 199.220.49.9
5 202 ms 202 ms 202 ms 68.136.44.134
6 207 ms 203 ms 205 ms sw-core-usa-lex-01.bmc.com [10.20.10.2]
7 203 ms 206 ms 203 ms sw-rd-core-usa-lex-01-ge61.bmc.com [10.20.254.6] (the selected router as the target host is not pinged)
Trace complete.

Note

When the traceroute does not return a router IP in its output list, the workflow uses the Ping_Router defined in the Host Down configuration module.

The workflow then logs on to the selected router by using the domain credentials supplied by the Credential Module. From the closest router, the workflow pings the target host. If the ping is successful, the host is not down. If the ping fails, the triage shows that the host is down. If the host is down, the incident is created.

Note

An incident is created only if the triage shows that the host is down.

The following figure shows an example Host Down incident created when a PATROL agent was determined to be unreachable. 

Example Host Down incident 

 

The corresponding event that triggered the Host Down workflow is updated with the detailed information, which you can view in the operator console through the Notes dialog box in the Details pane. The following figure shows an example Notes dialog box. 

Example Notes dialog box for Host Down event 

Was this page helpful? Yes No Submitting... Thank you

Comments