Workflow- Host Down
This triage-only workflow makes use of a series of ping and traceroute commands to verify whether the host specified in the event is actually down. The workflow does not attempt to restart the host.
The Triage Host_Down configuration module requires that you enter the IP address, host name, or fully qualified host name of the following devices to make the 360 degree verification process work:
Host down configuration parameters
|Ping_Host_1||Devices from which the ping command is triggered when a host down event is received. If the ping command fails, the ping host from where the ping failed launches a traceroute and collects the traceroute information. Check whether you can log on to the ping host system from the peer host system.|
|Ping_Router||Default router that launches the ping command when the traceroute does not return any router IPs in the path to the host identified in the host down event|
|Router_pingCommand||Ping command that is appropriate for the routers in your environment|
The ping host and the ping router devices are considered as jump hosts, intermediate devices that are configured to provide a gateway into a secure SSH environment. These jump hosts must be on different networks. By using the jump hosts, you can access or try to access the target host (the suspected down host). These jump hosts accept and return pings.
Define the host names, login credentials, invocation mechanisms, and other credential data about these devices under the proper domain in the Datacenter grouping of the AutoPilot Credentials Store. In addition, enter the credential data of the routers in the network topology that a traceroute command could retrieve in a list of router IPs.
The Host Down triage uses the 360-degree verification, pinging the target host from three different directions, launching a traceroute from the BMC Atrium Orchestrator CDP server, and pinging the target host from the closest router.
The triage begins after the workflow, by using the credentials from the Credential Store and extracts the connection details for the devices, as shown in the following figure:
Host down overview
The Host Down triage begins with a triangular ping originating from three user-defined host systems contained in the Triage and Remediation configuration module. These three hosts ping the host that is possibly down (target host). If any one of the ping attempts fails, the host from where the ping failed launches a traceroute command to trace the network route that the ping command took before reaching its destination. See the following figure:
Host down 360 service validation
The workflow collects and examines the output of the traceroute command to retrieve the list of IP addresses discovered during the trace along with the IP address of the target host. The workflow sifts through this information to distinguish intermediate connectivity issues from a host down condition.
In processing the traceroute information, the host down workflow first examines the list of traceroute IP addresses to see whether they include the IP address of the target or down host. If the target or down host IP is included in the list, the workflow selects the penultimate entry in the list as the closest router.
U:\>tracert engwin2k3aovm1.qa.bladelogic.com Tracing route to engwin2k3aovm1.qa.bladelogic.com [10.20.88.236 (this is the target host)] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms rtr4.bmc.com [184.108.40.206] 2 <1 ms <1 ms <1 ms 220.127.116.11 3 1 ms 1 ms 1 ms rtr-lan-verzion-mpls-ind-pun.bmc.com [137.72.252 .118] 4 5 ms 4 ms 4 ms 18.104.22.168 5 204 ms 202 ms 204 ms 22.214.171.124 6 204 ms 205 ms 203 ms sw-core-usa-lex-01.bmc.com [10.20.10.2] 7 202 ms 204 ms 203 ms sw-rd-core-usa-lex-01-ge61.bmc.com [10.20.254.6] (the closest router) 8 203 ms 205 ms 203 ms 10.20.88.236 (target host) Trace complete.
If the target host IP is not included, the workflow selects the last entry in the list of router IP addresses as the closest router.
U:\>tracert engwin2k3aovm2.qa.bladelogic.com Tracing route to engwin2k3aovm2.qa.bladelogic.com [10.20.88.43 (this is the target host)] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms rtr4.bmc.com [126.96.36.199] 2 <1 ms <1 ms <1 ms 188.8.131.52 3 <1 ms 1 ms 1 ms rtr-lan-verzion-mpls-ind-pun.bmc.com [137.72.252 .118] 4 4 ms 4 ms 7 ms 184.108.40.206 5 202 ms 202 ms 202 ms 220.127.116.11 6 207 ms 203 ms 205 ms sw-core-usa-lex-01.bmc.com [10.20.10.2] 7 203 ms 206 ms 203 ms sw-rd-core-usa-lex-01-ge61.bmc.com [10.20.254.6] (the selected router as the target host is not pinged) Trace complete.
When the traceroute does not return a router IP in its output list, the workflow uses the Ping_Router defined in the Host Down configuration module.
The workflow then logs on to the selected router by using the domain credentials supplied by the Credential Module. From the closest router, the workflow pings the target host. If the ping is successful, the host is not down. If the ping fails, the triage shows that the host is down. If the host is down, the incident is created.
An incident is created only if the triage shows that the host is down.
The following figure shows an example Host Down incident created when a PATROL agent was determined to be unreachable.
Example Host Down incident
The corresponding event that triggered the Host Down workflow is updated with the detailed information, which you can view in the TrueSight console through the Notes dialog box.