Troubleshooting
This troubleshooting section is intended to assist in identifying and resolving potential issues.
| Issue | Solution | ||||||
|---|---|---|---|---|---|---|---|
| Unable to get a response when the code exceeds 400 lines | This issue occurs because the engine was built without specifying the max_num_tokens parameter, which defaults to a lower value. To resolve it, rebuild the engine using the following command, with the parameter added: trtllm-build --checkpoint_dir ${UNIFIED_CKPT_PATH} \ --gemm_plugin float16 \ --output_dir ${ENGINE_PATH} \ --max_num_tokens 32768 Here, 32768 represents context length for Mixtral since they are using BYOLLM feature with Mixtral deployed on the Triton server. | ||||||
If the load-embeddings job is in a failed state. kubectl get jobs -n bmcami-prod-amiai-services
| Execute the following commands to restart job. kubectl delete job load-embeddings -n bmcami-prod-amiai-services helm upgrade amiai-services /<extracted_dir>/BMC-AMI-PLATFORM-2.0.00/helm_charts/07-helm-amiai-chart/ --namespace bmcami-prod-amiai-services --reuse-values | ||||||
| When you get the following error in BMC AMI Chat Assistant: The Assistant service is currently unavailable. Try again. If the problem persists, contact your BMC AMI Platform administrator. |
To check AI services health:
| ||||||
Issue: “READONLY You can't write against a read only replica”. You might encounter the following error message on the Uptrace UI during login: READONLY You can't write against a read only replica. |
| ||||||
| The CES instance isn't launching from BMC AMI Platform. | Make sure that your CES host is running and is accessible via HTTPS. Verify the host connectivity by clicking Test connection. | ||||||
| You can't add a CES instance. |
| ||||||
| Authentication fails when you're adding a CES instance. | Confirm that CES credentials are correct. For CES versions earlier than 23.04.06, you must enter credentials each time. | ||||||
| The CES version is not displayed during the setup. | Make sure that the CES instance is running and accessible. The version number appears only after successful authentication. | ||||||
| The CES instance is displayed as unavailable. | Click Test connection to check host status. If required, restart the CES host. | ||||||
| An HTTPS requirement error has occurred. | CES must use HTTPS for integration with BMC AMI Platform. Update CES configuration to enable HTTPS. | ||||||
| Credentials not saved for future access. | Upgrade to CES version 23.04.06 or 24.05.01 to enable credential storage in the BMC AMI platform database. BMC AMI Platform natively supports CES versions 23.04.06 and later modifications levels within the 23.04 release, as well as 24.05.01 and later. When you add a CES instance by using these versions, the credentials you enter are securely stored in the database and automatically reused for future access. BMC AMI Platform also supports CES version 20.15.03 or later, but you must enter your credentials each time you access CES from BMC AMI Platform. | ||||||
| The deployment completed, but the container images could not be pulled because they do not exist in the repository. | Remove the deployment by using the following teardown script and run the deployment again: #!/bin/bash # Kubernetes namespace cleanup script # This script cleans up Helm releases and all resources in specified namespaces # Define namespaces namespaces="bmcami-prod-user-management bmcami-prod-amiai-services bmcami-prod-data-service bmcami-prod-observability" echo "===== Deleting Helm releases (if any) =====" for ns in $namespaces; do echo "Namespace: $ns" releases=$(helm list -n "$ns" --short) if [ -n "$releases" ]; then echo "$releases" | xargs -r -I{} helm uninstall {} -n "$ns" else echo "No Helm releases found in namespace $ns" fi done echo "===== Deleting all resources in those namespaces =====" for ns in $namespaces; do echo "Cleaning namespace: $ns" kubectl delete all --all -n "$ns" --ignore-not-found=true kubectl delete pvc --all -n "$ns" --ignore-not-found=true kubectl delete secret --all -n "$ns" --ignore-not-found=true kubectl delete configmap --all -n "$ns" --ignore-not-found=true kubectl delete rolebinding,role,serviceaccount,networkpolicy,ingress -n "$ns" --all --ignore-not-found=true done echo "===== Deleting the namespaces =====" kubectl get namespace --no-headers -o custom-columns=:metadata.name | grep -E "bmcami-prod-(user-management|amiai-services|data-service|observability)" | xargs -r kubectl delete namespace echo "===== Deleting related PVs =====" kubectl get pv --no-headers -o custom-columns=:metadata.name | grep -E "bmcami-prod-(user-management|amiai-services|data-service|observability)" | xargs -r kubectl delete pv echo "===== Cleaning up NFS storage =====" if [ -d "/mnt/nfs" ]; then echo "Deleting all contents in /mnt/nfs/" sudo rm -rf /mnt/nfs/* if [ $? -eq 0 ]; then echo "NFS storage cleaned successfully" else echo "Failed to clean NFS storage" fi else echo "NFS directory /mnt/nfs/ does not exist" fi echo "===== Verifying cleanup =====" for ns in $namespaces; do echo "Checking $ns..." if kubectl get namespace "$ns" &>/dev/null; then echo "Namespace $ns still exists" kubectl get all -n "$ns" 2>/dev/null || echo "No resources found in namespace $ns" else echo "Namespace $ns fully removed." fi done remaining_pvs=$(kubectl get pv --no-headers -o custom-columns=:metadata.name 2>/dev/null | grep -E "bmcami-prod-(user-management|amiai-services|data-service|observability)" || true) if [ -n "$remaining_pvs" ]; then echo "Remaining PVs:" echo "$remaining_pvs" else echo "No PVs remaining." fi echo "===== Cleanup completed =====" rm -rf /mnt/nfs |