Troubleshooting


This troubleshooting section is intended to assist in identifying and resolving potential issues. 

IssueSolution
Unable to get a response when the code exceeds 400 lines

This issue occurs because the engine was built without specifying the max_num_tokens parameter, which defaults to a lower value. To resolve it, rebuild the engine using the following command, with the parameter added:

trtllm-build --checkpoint_dir ${UNIFIED_CKPT_PATH} \
             --gemm_plugin float16 \
             --output_dir ${ENGINE_PATH} \
             --max_num_tokens 32768

Here, 32768 represents context length for Mixtral since they are using BYOLLM feature with Mixtral deployed on the Triton server.

If the load-embeddings job is in a failed state.

kubectl get jobs -n bmcami-prod-amiai-services

NameStatusCompletions 
load-embeddingsFailed0/1

Execute the following commands to restart job.

kubectl delete job load-embeddings -n bmcami-prod-amiai-services
helm upgrade amiai-services /<extracted_dir>/BMC-AMI-PLATFORM-2.0.00/helm_charts/07-helm-amiai-chart/ --namespace bmcami-prod-amiai-services --reuse-values
When you get the following error in BMC AMI Chat Assistant:
The Assistant service is currently unavailable. Try again. If the problem persists, contact your BMC AMI Platform administrator. 
  1. Click the Platform manager > BMC AMI AI Manager > BMC AMI Assistant chat settings.
  2. In the BMC AMI Assistant chat settings page, check if a model is assigned to  AMI Assistant. If a model is not assigned, assign the recommended model and retry the AMI Assistant. If a model is assigned, navigate to the AI service health page and verify that all services and models are running. 

      To check AI services health:

  • Click the Platform manager > BMC AMI AI Manager > AI service health.
    If any services are down, make sure that the down services are up and running. 

Issue: “READONLY You can't write against a read only replica”.

You might encounter the following error message on the Uptrace UI during login:

READONLY You can't write against a read only replica.
script: 6922f2a31daf24cb6eb66399c4474600e84a5c09, on @user_script:43.

 

  1. Log in to the setup machine environment.
  2. Run the following command to list Redis pods:
    Kubectl get pod -A | grep redis
  3. Restart all pods related to Redis using the command:
    kubectl delete po redis-master-0 redis-replica-0 redis-replica-1 redis-sentinel-0 redis-sentinel-1 redis-sentinel-2 -n bmcami-prod-data-service
  4. Wait for few minutes.
  5. Verify the fix by logging back into the Uptrace service.
The CES instance isn't launching from BMC AMI Platform.Make sure that your CES host is running and is accessible via HTTPS. Verify the host connectivity by clicking Test connection.
You can't add a CES instance.
  • You can add CES instances to BMC AMI Platform only if you are using CES version 20.15.03 or later. Earlier versions aren't supported because they don't use the ADAPT user interface.
  • Confirm that the host name is correct, HTTPS is used, and the port (default 48443) is open. Also verify that the CES role is assigned to your user account.
  • If your CES instance is using HTTPS with an untrusted certificate, you can't add the CES instance directly.
    Workaround: Open the CES instance URL in your browser. When you encounter the security warning, click Continue to bypass it and allow the CES user interface to load completely. After the CES UI is loaded, you can proceed to add the CES instance successfully.
Authentication fails when you're adding a CES instance.Confirm that CES credentials are correct. For CES versions earlier than 23.04.06, you must enter credentials each time.
The CES version is not displayed during the setup.Make sure that the CES instance is running and accessible. The version number appears only after successful authentication.
The CES instance is displayed as unavailable.Click Test connection to check host status. If required, restart the CES host.
An HTTPS requirement error has occurred.CES must use HTTPS for integration with BMC AMI Platform. Update CES configuration to enable HTTPS.
Credentials not saved for future access.

Upgrade to CES version 23.04.06 or 24.05.01 to enable credential storage in the BMC AMI platform database.

BMC AMI Platform natively supports CES versions 23.04.06 and later modifications levels within the 23.04 release, as well as 24.05.01 and later. When you add a CES instance by using these versions, the credentials you enter are securely stored in the database and automatically reused for future access. 

Warning
Important

If you are using CES versions 24.01.xx, 24.02.xx, 24.03.xx, 24.04.xx, you can access CES via BMC AMI Platform, but you must enter your credentials each time you access CES from the platform.

BMC AMI Platform also supports CES version 20.15.03 or later, but you must enter your credentials each time you access CES from BMC AMI Platform. 

The deployment completed, but the container images could not be pulled because they do not exist in the repository.

Remove the deployment by using the following teardown script and run the deployment again:

Warning
Important

Make sure to change the NFS folder to your own in the teardown script.

#!/bin/bash

# Kubernetes namespace cleanup script
# This script cleans up Helm releases and all resources in specified namespaces

# Define namespaces
namespaces="bmcami-prod-user-management bmcami-prod-amiai-services bmcami-prod-data-service bmcami-prod-observability"

echo "===== Deleting Helm releases (if any) ====="
for ns in $namespaces; do
  echo "Namespace: $ns"
  releases=$(helm list -n "$ns" --short)
  if [ -n "$releases" ]; then
    echo "$releases" | xargs -r -I{} helm uninstall {} -n "$ns"
  else
    echo "No Helm releases found in namespace $ns"
  fi
done

echo "===== Deleting all resources in those namespaces ====="
for ns in $namespaces; do
  echo "Cleaning namespace: $ns"
  kubectl delete all --all -n "$ns" --ignore-not-found=true
  kubectl delete pvc --all -n "$ns" --ignore-not-found=true
  kubectl delete secret --all -n "$ns" --ignore-not-found=true
  kubectl delete configmap --all -n "$ns" --ignore-not-found=true
  kubectl delete rolebinding,role,serviceaccount,networkpolicy,ingress -n "$ns" --all --ignore-not-found=true
done

echo "===== Deleting the namespaces ====="
kubectl get namespace --no-headers -o custom-columns=:metadata.name | grep -E "bmcami-prod-(user-management|amiai-services|data-service|observability)" | xargs -r kubectl delete namespace

echo "===== Deleting related PVs ====="
kubectl get pv --no-headers -o custom-columns=:metadata.name | grep -E "bmcami-prod-(user-management|amiai-services|data-service|observability)" | xargs -r kubectl delete pv

echo "===== Cleaning up NFS storage ====="
if [ -d "/mnt/nfs" ]; then
  echo "Deleting all contents in /mnt/nfs/"
  sudo rm -rf /mnt/nfs/*
  if [ $? -eq 0 ]; then
    echo "NFS storage cleaned successfully"
  else
    echo "Failed to clean NFS storage"
  fi
else
  echo "NFS directory /mnt/nfs/ does not exist"
fi

echo "===== Verifying cleanup ====="
for ns in $namespaces; do
  echo "Checking $ns..."
  if kubectl get namespace "$ns" &>/dev/null; then
    echo "Namespace $ns still exists"
    kubectl get all -n "$ns" 2>/dev/null || echo "No resources found in namespace $ns"
  else
    echo "Namespace $ns fully removed."
  fi
done

remaining_pvs=$(kubectl get pv --no-headers -o custom-columns=:metadata.name 2>/dev/null | grep -E "bmcami-prod-(user-management|amiai-services|data-service|observability)" || true)
if [ -n "$remaining_pvs" ]; then
  echo "Remaining PVs:"
  echo "$remaining_pvs"
else
  echo "No PVs remaining."
fi

echo "===== Cleanup completed ====="


rm -rf /mnt/nfs
 

 

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC AMI Platform 2.0