Troubleshooting pods and containers in TrueSight Network Automation
This topic contains information that addresses troubleshooting pod and container issues using TrueSight Network Automation.
Job fails to configure or unconfigure a container
The main problem that requires troubleshooting in Pod and Container Management (PCM) testing is the failure of the job to configure or unconfigure a container. To troubleshoot job failures, go to the job list page in the TrueSight Network Automation GUI and select the job which failed, then inspect the action(s) within the job that failed and inspect the contents of the ad hoc template being used by each Deploy to Active action.
Jobs to configure or unconfigure containers are system-generated jobs, not user-generated jobs, so you might need to change your job list filter in order to see them listed. The name of the container being configured or unconfigured is embedded in the Description dynamic field of the container. You can enable Description as a filterable field for jobs if desired, to make them easier to find. The view page for a container also presents the ID of the jobs last used to configure or unconfigure a container.
If the job to configure a container fails, TrueSight Network Automation launches a job to automatically unconfigure the container. If the job to unconfigure a container is successful, the container is automatically deleted, which frees its resources back to the pod and to Internet Protocol Address Management (IPAM). If the job to unconfigure a container fails, the container is not deleted. You must manually delete the container from the Container listing page and manually unconfigure the devices which TrueSight Network Automation failed to unconfigure.
Container creation fails with a CapacityExceededException
One type of failure possible when creating a container on a pod is CapacityExceededException
due to an address pool being too small. Each address pool reserved by a container from an address range in the pod must be large enough to allow the container to acquire all the addresses from it which the container blueprint requires. The number of addresses which can be acquired from an address pool is two less than the size of the pool as defined by its mask. The reason it is two less than the pool size is because the first and last addresses within the pool cannot be used. The first address will be all zeros and the last address will be all ones within the subnet in question, both of which are reserved addresses used for broadcasting within the subnet. For example, if you define an address range with a mask for its pools of 255.255.255.248, that corresponds to a range in which each pool is of size eight, which means that a given pool could be used to acquire a maximum of six addresses for a given container before being exhausted.
Using simulated mode to test a container blueprint
When you are first testing out a new container blueprint, it is helpful to use simulated mode, so that you do not have to worry about device state and the correctness of your template commands initially while you work the kinks out of your substitution parameters. To use simulated mode, perform the following steps:
- Set the
simulateConnection
property to true in the global.properties.imported file. - Restart TrueSight Network Automation.
In the BCAN_DATA\devices directory, create *.running, *.startup, and *.image text files for each device in your pod, where the base name of the file corresponds to the address of the device.
The *.running and *.startup files hold the contents of the running and startup configurations for a device. The *.image file holds information about the OS image present on the device. Each of these files can just contain a line that says
dummy
for the purposes of PCM testing.
#The following properties are read from inspectFaultHost.properties to make the
#Inspect fault host custom action return specified property values for fwsm or ace
#device.
#
#1. firewall.host1
#2. firewall.host2
#
#The above 2 properties take FWSM devices which participate in pair. Further
#host1 / host2 can allow multiple address names comma separated, e.g. FirewallA1,FirewallA2
#This is the case in large gold container where firewall host pairs are assigned in
#round robin.
#
#3. loadbalancer.host1
#4. loadbalancer.host2
#
#The above 2 properties take ACE devices which participate in pair. Further host1 / host2
#can allow multiple address names comma separated, e.g. LoadBalancerHost1,LoadBalancerHost2
#This is the case in large gold container where firewall host pairs are assigned in
#round robin.
#
#5. firewall.host1.adminFailoverGroup
#
#The above property controls the failover group returned by FWSM host (1/2/null). Combination
#of failover group and community1 / community2 active flag value will be used to determine
#adminActiveFlag property of firewall host.
#
#6. firewall.host1.faultCommunity1ActiveFlag
#
#The above property controls the faultCommunity1Activeflag for FWSM host1 to be returned by
#simulated custom action, for e.g. (Active / Standby)
#
#7. firewall.host1.faultCommunity2ActiveFlag
#
#The above property controls the faultCommunity2ActiveFlag for FWSM host1 to be returned by
#simulated custom action, for e.g. (Active / Standby)
#
#8. firewall.host2.adminFailoverGroup
#
#The above property controls the failover group returned by FWSM host (1/2/null). Combination
#of failover group and community1 / community2 active flag value will be used to determine
#adminActiveFlag property of firewall host.
#
#9. firewall.host2.faultCommunity1ActiveFlag
#
#The above property controls the faultCommunity1Activeflag for FWSM host2 to be returned by
#simulated custom action, for e.g. (Active / Standby)
#
#10. firewall.host2.faultCommunity2ActiveFlag
#
#The above property controls the faultCommunity2Activeflag for FWSM host2 to be returned by
#simulated custom action, for e.g. (Active / Standby)
#
#11. loadbalancer.host1.adminActiveFlag=true
#
#The above property controls the adminActiveFlag for ACE host1 to be returned by simulated
#custom action, for e.g. (true / false).
#
#12. loadbalancer.host2.adminActiveFlag=false
#
#The above property controls the adminActiveFlag for ACE host2 to be returned by simulated
#custom action, for e.g. (true / false).
#
#Suppose Pod has following ACE fault pairs LoadBalancerHost1, LoadBalancerHost2
#and following FWSM pairs FirewallHostA1 and FirewallHostA2 and we want make pair nodes look like
#below
# FirewallHostA1 = Admin
# FirewallHostA1 = fault comunity 1 active
# FirewallHostA2 = fault community 2 active
# LoadBalancerHost1 = Admin
# the 12 properties would look like below
firewall.host1=FirewallHostA1
firewall.host2=FirewallHostA2
loadbalancer.host1=LoadBalancerHost1
loadbalancer.host2=LoadBalancerHost2
firewall.host1.adminFailoverGroup=1
firewall.host1.faultCommunity1ActiveFlag=Active
firewall.host1.faultCommunity2ActiveFlag=Standby
firewall.host2.adminFailoverGroup=
firewall.host2.faultCommunity1ActiveFlag=Standby
firewall.host2.faultCommunity2ActiveFlag=Active
loadbalancer.host1.adminActiveFlag=true
loadbalancer.host2.adminActiveFlag=false
When you are ready to move on to testing out container creation on real devices, one recommended way to test out the connectivity of network paths within a container is to connect to each in device in the container and verify that you can ping the VLAN interface address of each of the other devices that are supposed to be connected to it within the container.
Remember to use the view pages to inspect the state of your pod and container after you create them, to make sure that resources are in the state that you expect them to be in. You can also inspect the UI of the IPAM system to make sure that its resources are also in the state that you expect them to be in.
Troubleshooting templates used during container configuration and unconfiguration
When you are troubleshooting or developing templates to be used during container configuration and unconfiguration, set the vdcCommitContainerActions
property to false in the global.properties file. This property is set to true by default. Setting it to false prevents TrueSight Network Automation from committing any configuration changes to the devices involved. Any configuration changes you make can be rolled back by restarting the devices.
Troubleshooting "Unable to reach CMDB. Please verify the credentials" error received during POD configuration
When creating a pod in TrueSight Network Automation, you might receive an Unable to reach CMDB. Please verify the credentials
error. If this occurs, perform the following steps in order until you reach an action that resolves the problem:
- Check if the Site has been created in the
BMC.CORE:BMC_PhysicalLocation
form on the BMC Remedy AR System server. The error message can be displayed if the record does not exist in this form.
To create the record, please follow the steps as described in Creating a physical location for a pod. - If the Site has been created in the
PhysicalLocation
form and the problem persists, then verify Tomcat is running properly:- Enter the URL
http://AtriumWebServicesServer:portNumber
in a browser.
If Tomcat is running properly, an Apache screen is displayed. - If the Apache screen is not displayed, restart Tomcat. If the problem still persists, check for errors in the Tomcat logs typically located in the AtriumWebServiceDirectory\BMC Software\Atrium Web Registry\shared\tomcat\logs directory.
- Enter the URL
- If Tomcat is working, then verify the Atrium Web Services are working properly:
- Enter the Web Services URL
http://AtriumWebServicesServer:portNumber/cmdbws/server/cmdbws
in a browser.
The result should be "Please enable REST support in WEB-INF/conf/axis2.xml and WEB-INF/web.xml". - If nothing is returned, then it’s an issue with Atrium Web Services. Verifying the Atrium Web Services logs (typically located in the AtriumWebServiceDirectory\BMC Software\Atrium Web Registry\Logs folder) will help to narrow down the issue.
- If the page is loading fine, then settings in the TrueSight Network Automation console should be verified (Log on to TrueSight Network Automation and from Admin Tab > System Admin > System Parameters > External Integrations, verify the settings for "Enable CMDB Integration")
- Enter the Web Services URL
- Verify the settings on the TrueSight Network Automation server for the BMC Atrium CMDB Integration:
- “Enable CMDB Integration” should be checked under the External Integration section. For BMC Cloud Lifecycle Management use cases, the "Enable CMDB Integration" parameter is used. The "Web Services Registry Integration" parameter should be kept disabled.
- Verify that the ‘Web Service Endpoint URL’ is correct. The format is
http://AtriumWebServicesServer:portNumber/cmdbws/server/cmdbws
. - Enter the Username and Password. Ensure you can log in to the BMC Remedy AR System server with the same username and password. BMC recommends that you use "Demo" credentials for this step.
- If the credentials are correct and the error is thrown, verify the
cmdbws.properties
file, typically located in the AtriumWebServiceDirectory\BMC Software\Atrium Web Registry\wsc\cmdbws folder.- Verify that the
hostname
in the file is thehostname
for the BMC Remedy AR System Server.
Thehostname
should not belocalhost
, unless Atrium Web Services Registry Component is installed on the BMC Remedy AR System Server. - If the
hostname
must be changed, after changing thehostname
in the file restart the Tomcat service on the server.
If the Atrium Web Service is running on the bundled Tomcat, you can use the Tomcat startup and shutdown scripts that are present in the AtriumWebServiceDirectory/shared/tomcat/bin folder for shutting down and starting Tomcat services.
- Verify that the
- If the BMC Remedy AR System Server
hostname
is correct in the cmdbws.properties file then check if the webapps folder (typically located at AtriumWebServiceDirectory\BMC Software\Atrium Web Registry\wsc\webapps) contains two folders, namely atriumws7604 and cmdbws. These folders should have three subfolders named axis2-web, META-INF, and WEB-INF. If these folders are missing, copy them from an available working environment. - If this is a Linux environment then make sure that the following environment variables are set before starting BMC Atrium Web Services. (These commands can also be set in the bmcAtriumSetenv.sh file. After editing the file, restart the Tomcat services.)
ATRIUMCORE_HOME=/opt/bmc/AtriumWebRegistry
export ATRIUMCORE_HOME
- Sometimes, there can be communication issues between the TrueSight Network Automation Web services and the TrueSight Network Automation database. If the issue persists even after following the above steps, restart TrueSight Network Automation Web services.
- For Windows environments, TrueSight Network Automation Web services are listed as
BCA-Networks Web Server
underservices.msc
and can be started or stopped from there. - For Linux environments, the following command can be used to stop or start TrueSight Network Automation Web services:
service enatomcat stop / start
The enatomcat services are listed in the /etc/init.d directory.
- For Windows environments, TrueSight Network Automation Web services are listed as
- If the problem persists after completing this procedure, contact BMC support.
Note
Verions 8.2.03 and 8.3 have better error logging capabilities for this issue. If upgrading TrueSight Network Automation is required, first review the Base product versions and compatibility for upgrade to version 4.5 or version 4.6 page.
Gathering PCM-specific information for BMC Support
If you need to gather detailed information about PCM behavior in order for BMC Support staff to troubleshoot issues, you can bump up the logging in the com.bmc.bcan.engine.network.pcm
package in the logging.properties file and restart TrueSight Network Automation. This generates a lot of diagnostic details in the BCA-Networks.log.* log files while you execute subsequent PCM operations, which can then be sent to BMC Support for analysis.
Comments
Log in or register to comment.