Troubleshooting upgrade and pipeline failure issues

BMC Helix Service Management installation consists of various pipelines, such as HELIX_PLATFORM_DEPLOY, HELIX_NONPLATFORM_DEPLOY, and HELIX_SMARTAPPS_DEPLOY. Use the information in this topic to troubleshoot the pipeline failure and upgrade issues.

To troubleshoot failure due to incorrect parameter values

If a pipeline fails due to incorrect parameter values, perform the following steps:

On the Jenkins UI, navigate to the pipeline where the failure occurred.
In the pipeline where the failure occurred, select the latest build in the Build History pane, and click Console Output.
On the Console Output page, check the logs to find the reason for the failure.
Navigate to the HELIX_ONPREM_DEPLOYMENT pipeline.
From the Build History pane, select the latest job, and click Rebuild.
Specify the correct parameter values.
For example, if the HELIX_PLATFORM_DEPLOY pipeline fails due to incorrect FTS_ELASTICSEARCH_PORT value, specify the correct FTS_ELASTICSEARCH_PORT value.
Select the HELIX_GENERATE_CONFIG pipeline.
Make sure that you do not select any other pipeline.
Click Rebuild.
Rerun the failed pipeline:
1. Navigate to the pipeline where the failure occurred.
2. In the failed pipeline, in the Build History pane, select the latest build, and click Rebuild.
3. On the Rebuild page, select the ReRun check box, and click Rebuild.
  The pipeline starts running from the instance where it stopped.

To troubleshoot failure due to environment issues

If a pipeline fails due to environment issues such as Kubernetes cluster not being reachable, PVC not mounted successfully, or insufficient resources to start a container, perform the following steps:

In your Jenkins server, run the following command to identify the issue:
kubectl get events -n <Innovation Suite namespace>
Run the following command to identify the PVC-related issues:
kubectl get events -n <BMC Helix Platform namespace>
Troubleshoot and fix the issue.
Rerun the failed pipeline:
1. On the Jenkins UI, navigate to the pipeline where the failure occurred.
2. In the failed pipeline, in the Build History pane, select the latest build, and click Rebuild.
3. On the Rebuild page, select the ReRun check box, and click Rebuild.
  The pipeline starts running from the instance where it stopped.

To troubleshoot BMC Helix Service Management upgrade issues

Use the following steps to collect information and logs to troubleshoot BMC Helix Service Management upgrade issues:

Important

Make sure that your AR Server license is valid before retrying an upgrade.
Make sure that the platform-admin-ext service has the EXTERNAL-IP value configured.
Verify the sizing guidelines for the BMC Helix Service Management version you want to upgrade.

Monitor the HELIX_FULL_STACK_UPGRADE pipeline console logs and the BMC Helix Innovation Suite namespace.
If the pipeline logs are stuck on a particular pod, check the relevant pod container logs by using the following command:
kubectl logs -f <podname> -c <containername> -n <Innovation Suite namespace>
Important
The platformupgrade-0 pod can take up to an hour to start. You can check the container logs. Alternatively, you can log in to the shell, navigate to the ARSystem/db directory, and view the relevant D2P log by using the following command:
tail -f ard2pdeploymentactivity.log
Check the activities on the Kubernetes cluster by using the following command:
kubectl get events -n <Innovation Suite namespace> --field-selector type=Warning
Check the platformupgrade-0 pod logs:
1. After the platformupgrade-0 pod is up and running, log in to the pod.
2. Run the following command to check logs:
  kubectl -n <Innovation Suite namespace> exec -it podname -c containername --bash
3. In the ARSystem/db directory, check logs with the most recent timestamp.
Monitor the progress of BMC Helix Service Management upgrade by using the following command:
helm ls -n <Innovation Suite namespace>
It takes 5 or more hours to upgrade BMC Helix Service Management. However, after an hour, you might view the updated version on some applications.
In the AR System Deployment Management Console, go to Manage Package, and check the package status.
If the HELIX_FULL_STACK_UPGRADE pipeline fails due to a resource requirement or known configuration error, rebuild the last failed build, and click RERUN.
The RERUN option restarts the BMC Helix Service Management upgrade from where it last failed.

To revert BMC Helix Service Management to the source version

If you have created a database backup before upgrading BMC Helix Service Management, perform the following steps to revert BMC Helix Service Management to the source version from an unrecoverable upgrade failure:

Run the following command to check the Helm version:
helm -n <Innovation Suite namespace> ls
If the Helm version is updated to the version you are trying to upgrade, run the following command:
kubectl -n <Innovation Suite namespace> get sts,deploy
Example output:
1/1 or 2/2 indicates the number of pods in an active statefulset or deployment.
Scale down the pods by using the following commands:
kubectl scale --all --replicas=0 --namespace=<Innovation Suite namespace> deployment
kubectl scale --all --replicas=0 --namespace=<Innovation Suite namespace> sts
Restore the database to the database backup that you took before the upgrade.
For example, for PostgreSQL database, in the pgAdmin tool, perform the following steps:
1. Force drop the is_db database.
2. Create a new database with the same name, is_db.
3. Restore the database backup that you took before the upgrade.
  It takes about 15 minutes to restore the database.
Get helm history by running the following command:
helm -n <Innovation Suite namespace> ls
Get the revision history for the relevant version by running the following command:
helm -n <Innovation Suite namespace> history platform-fts-<Innovation Suite namespace>
Revert to previous working revision (where revision number is date of known last working version) by running the following command:
helm -n <Innovation Suite namespace> rollback platform-fts-<Innovation Suite namespace> <revision>
Repeat until all the pod releases are reverted to the previous version.
Example:

Some releases such as the Support Assistant tool may not change.
The pods automatically scale up after reverting to the Helm version.
If the pods are not scaled up, run the following commands by changing the number of replicas as per your deployment size:
kubectl -n <Innovation Suite namespace> scale deploy atriumwebsvc catalog-itsm-plugin dwp-tomcat-deployment-master midtier-int midtier-user smartit-master staledata virtualchatplugin --replicas=<count>

kubectl -n <Innovation Suite namespace> scale sts openfire platform-fts --replicas=<count>
Example:
kubectl -n <Innovation Suite namespace> scale deploy atriumwebsvc catalog-itsm-plugin dwp-tomcat-deployment-master midtier-int midtier-user smartit-master staledata virtualchatplugin --replicas=1

kubectl -n <Innovation Suite namespace> scale sts openfire platform-fts --replicas=1
Make sure that all pods are running.
Confirm the AR Server license is valid before retrying the upgrade.