Troubleshooting


This troubleshooting section is intended to assist in identifying and resolving potential issues. 

IssueSolution
Unable to get a response when the code exceeds 400 lines

This issue occurs because the engine was built without specifying the max_num_tokens parameter, which defaults to a lower value. To resolve it, rebuild the engine using the following command, with the parameter added:

trtllm-build --checkpoint_dir ${UNIFIED_CKPT_PATH} \
             --gemm_plugin float16 \
             --output_dir ${ENGINE_PATH} \
             --max_num_tokens 32768

Here, 32768 represents context length for Mixtral since they are using BYOLLM feature with Mixtral deployed on the Triton server.

If the load-embeddings job is in a failed state.

kubectl get jobs -n bmcami-prod-amiai-services

NameStatusCompletions 
load-embeddingsFailed0/1

Execute the following commands to restart job.

kubectl delete job load-embeddings -n bmcami-prod-amiai-services
helm upgrade amiai-services /<extracted_dir>/BMC-AMI-PLATFORM-2.0.00/helm_charts/07-helm-amiai-chart/ --namespace bmcami-prod-amiai-services --reuse-values
When you get the following error in BMC AMI Chat Assistant:
The Assistant service is currently unavailable. Try again. If the problem persists, contact your BMC AMI Platform administrator. 
  1. Click the Platform manager > BMC AMI AI Manager > BMC AMI Assistant chat settings.
  2. In the BMC AMI Assistant chat settings page, check if a model is assigned to  AMI Assistant. If a model is not assigned, assign the recommended model and retry the AMI Assistant. If a model is assigned, navigate to the AI service health page and verify that all services and models are running. 

      To check AI services health:

  • Click the Platform manager > BMC AMI AI Manager > AI service health.
    If any services are down, make sure that the down services are up and running. 

Issue: “READONLY You can't write against a read only replica”.

You might encounter the following error message on the Uptrace UI during login:

READONLY You can't write against a read only replica.
script: 6922f2a31daf24cb6eb66399c4474600e84a5c09, on @user_script:43.

 

  1. Log in to the setup machine environment.
  2. Run the following command to list Redis pods:
    Kubectl get pod -A | grep redis
  3. Restart all pods related to Redis using the command:
    kubectl delete po redis-master-0 redis-replica-0 redis-replica-1 redis-sentinel-0 redis-sentinel-1 redis-sentinel-2 -n bmcami-prod-data-service
  4. Wait for few minutes.
  5. Verify the fix by logging back into the Uptrace service.
The CES instance isn't launching from BMC AMI Platform.Make sure that your CES host is running and is accessible via HTTPS. Verify the host connectivity by clicking Test connection.
You can't add a CES instance.
  • You can add CES instances to BMC AMI Platform only if you are using CES version 20.15.03 or later. Earlier versions aren't supported because they don't use the ADAPT user interface.
  • Confirm that the host name is correct, HTTPS is used, and the port (default 48443) is open. Also verify that the CES role is assigned to your user account.
  • If your CES instance is using HTTPS with an untrusted certificate, you can't add the CES instance directly.
    Workaround: Open the CES instance URL in your browser. When you encounter the security warning, click Continue to bypass it and allow the CES user interface to load completely. After the CES UI is loaded, you can proceed to add the CES instance successfully.
Authentication fails when you're adding a CES instance.Confirm that CES credentials are correct. For CES versions earlier than 23.04.06, you must enter credentials each time.
The CES version is not displayed during the setup.Make sure that the CES instance is running and accessible. The version number appears only after successful authentication.
The CES instance is displayed as unavailable.Click Test connection to check host status. If required, restart the CES host.
An HTTPS requirement error has occurred.CES must use HTTPS for integration with BMC AMI Platform. Update CES configuration to enable HTTPS.
Credentials not saved for future access.

Upgrade to CES version 23.04.06 or 24.05.01 to enable credential storage in the BMC AMI platform database.

BMC AMI Platform natively supports CES versions 23.04.06 and later modifications levels within the 23.04 release, as well as 24.05.01 and later. When you add a CES instance by using these versions, the credentials you enter are securely stored in the database and automatically reused for future access. 

Warning
Important

If you are using CES versions 24.01.xx, 24.02.xx, 24.03.xx, 24.04.xx, you can access CES via BMC AMI Platform, but you must enter your credentials each time you access CES from the platform.

BMC AMI Platform also supports CES version 20.15.03 or later, but you must enter your credentials each time you access CES from BMC AMI Platform. 

The deployment completed, but the container images could not be pulled because they do not exist in the repository.

Remove the deployment by using the following teardown script and run the deployment again:

Warning
Important

Make sure to change the NFS folder to your own in the teardown script.

#!/bin/bash

# Kubernetes namespace cleanup script
# This script cleans up Helm releases and all resources in specified namespaces

# Define namespaces
namespaces="bmcami-prod-user-management bmcami-prod-amiai-services bmcami-prod-data-service bmcami-prod-observability"

echo "===== Deleting Helm releases (if any) ====="
for ns in $namespaces; do
  echo "Namespace: $ns"
  releases=$(helm list -n "$ns" --short)
  if [ -n "$releases" ]; then
    echo "$releases" | xargs -r -I{} helm uninstall {} -n "$ns"
  else
    echo "No Helm releases found in namespace $ns"
  fi
done

echo "===== Deleting all resources in those namespaces ====="
for ns in $namespaces; do
  echo "Cleaning namespace: $ns"
  kubectl delete all --all -n "$ns" --ignore-not-found=true
  kubectl delete pvc --all -n "$ns" --ignore-not-found=true
  kubectl delete secret --all -n "$ns" --ignore-not-found=true
  kubectl delete configmap --all -n "$ns" --ignore-not-found=true
  kubectl delete rolebinding,role,serviceaccount,networkpolicy,ingress -n "$ns" --all --ignore-not-found=true
done

echo "===== Deleting the namespaces ====="
kubectl get namespace --no-headers -o custom-columns=:metadata.name | grep -E "bmcami-prod-(user-management|amiai-services|data-service|observability)" | xargs -r kubectl delete namespace

echo "===== Deleting related PVs ====="
kubectl get pv --no-headers -o custom-columns=:metadata.name | grep -E "bmcami-prod-(user-management|amiai-services|data-service|observability)" | xargs -r kubectl delete pv

echo "===== Cleaning up NFS storage ====="
if [ -d "/mnt/nfs" ]; then
  echo "Deleting all contents in /mnt/nfs/"
  sudo rm -rf /mnt/nfs/*
  if [ $? -eq 0 ]; then
    echo "NFS storage cleaned successfully"
  else
    echo "Failed to clean NFS storage"
  fi
else
  echo "NFS directory /mnt/nfs/ does not exist"
fi

echo "===== Verifying cleanup ====="
for ns in $namespaces; do
  echo "Checking $ns..."
  if kubectl get namespace "$ns" &>/dev/null; then
    echo "Namespace $ns still exists"
    kubectl get all -n "$ns" 2>/dev/null || echo "No resources found in namespace $ns"
  else
    echo "Namespace $ns fully removed."
  fi
done

remaining_pvs=$(kubectl get pv --no-headers -o custom-columns=:metadata.name 2>/dev/null | grep -E "bmcami-prod-(user-management|amiai-services|data-service|observability)" || true)
if [ -n "$remaining_pvs" ]; then
  echo "Remaining PVs:"
  echo "$remaining_pvs"
else
  echo "No PVs remaining."
fi

echo "===== Cleanup completed ====="


rm -rf /mnt/nfs
 

BMC AMI AI knowledge Hub—Common errors

This section helps you identify and resolve common errors in the BMC AMI AI knowledge hub service.

You can use the error code in the product message to navigate directly to the matching troubleshooting entry.

How to find the asset ID

The system often requires the asset ID when you report an error or when an administrator searches logs.

  • From API responses—Open the developer tools in the UI, navigate to the Network tab, and locate the upload, asset list, or asset details API. The response body includes asset_id.
  • From logs—The system might log the asset ID when an error occurs. Search the logs by file name or by the approximate upload time to locate the relevant entries.

Quick lookup

Error codeIssueArea
 AAPKNW001E  Unsupported file type  Upload
 AAPKNW002E  File too large  Upload
 AAPKNW003E  Asset deleted  Publish
 AAPKNW007E  File no longer exists  Publish
 AAPKNW009E  Asset not found  General
 AAPKNW011E  Operation in progress  General
 AAPKNW021E  Service unavailable  Platform
 AAPKNW025E  Empty document  Publish
 AAPKNW026E  Encrypted document  Publish
 AAPKNW027E  Database update failed  Publish / Unpublish
 AAPKNW028E  No valid text  Publish
 AAPKNW029E  Upload failed  Upload
 AAPKNW030E  Embedding service issue  Embedding
 AAPKNW031E  Embedding service issue  Embedding
 AAPKNW032E  OCR service issue  OCR
 AAPKNW033E  OCR service issue  OCR
 AAPKNW034E  Link expired  Download link
 AAPKNW035E  Empty file  Upload
 AAPKNW036E  Unrecognized file type  Upload
 AAPKNW037E  Unexpected publish error  Publish
 AAPKNW038E  Document not supported  Publish
 AAPKNW040E  Unpublish failed  Unpublish
 AAPKNW041E  Delete failed  Delete
 AAPKNW042E  Cancel publish failed  Cancel
 AAPKNW043E  AI visibility update failed  AI visibility

Troubleshooting entries

IssueSolution


AAPKNW001E—Unsupported file type

End user 
Use a supported format: TXT, PDF, DOCX, or PPTX.

Administrator  
The system supports only TXT, PDF, DOCX, and PPTX files.


AAPKNW002E—File too large

End user 
Reduce the file size below 20 MB. Compress images, remove unused content, or split the document.

Administrator 
The system enforces a maximum file size of 20 MB.


AAPKNW003E—Asset deleted

End user 
Upload the file again if you need to publish it.

Administrator  
Verify whether another process or user deleted the asset.


AAPKNW007E—File no longer exists

End user 
Upload the file again.

Administrator  
Verify that the file still exists in storage and was not removed.


AAPKNW009E—Asset not found

End user 
Refresh the page and try again.

Administrator  
Verify database connectivity and review logs using the asset ID.


AAPKNW011E—Operation in progress

End user 
Wait for the current operation to complete, then refresh and try again.

Administrator  
Verify that the asset is not locked by another workflow operation.


AAPKNW021E—Service unavailable

End user 
Wait a few minutes and try again.

Administrator  
Verify that all required services are running and reachable.


AAPKNW025E—Empty document

End user 
Use a document that contains readable text.

Administrator  
Verify that text extraction succeeded.


AAPKNW026E—Encrypted document

End user 
Remove password protection or encryption and upload the file again.

Administrator  
The system cannot process encrypted files.


AAPKNW027E—Database update failed

End user 
Try again later.

Administrator  
Verify the vector database. Check connectivity, resources, and logs.


AAPKNW028E—No valid text

End user 
Use a document that contains readable text.

Administrator  
Content validation failed. No system changes are required.


AAPKNW029E—Upload failed

End user 
Try uploading the file again. If the issue continues, check your network connection or try a different file.

Administrator  
Verify network connectivity, storage availability, and application logs. Search logs using the asset ID or file name.


AAPKNW030E—Embedding service issue

End user 
Try again later.

Administrator  
Verify that the embedding service is running and reachable. Check logs for errors, timeouts, or rate limits.


AAPKNW031E—Embedding service unavailable

End user 
Try again later.

Administrator  
Verify that the embedding service is running and reachable. Check logs for errors, timeouts, or rate limits.


AAPKNW032E—OCR service issue

End user 
Try again later or use a smaller file.

Administrator  
Verify the OCR service configuration and logs.

Warning
Important

Large or image-heavy PDFs might require more memory or compression.


AAPKNW033E—OCR service unavailable

End user 
Try again later or use a smaller file.

Administrator  
Verify the OCR service configuration and logs.

Warning
Important

Large or image-heavy PDFs might require more memory or compression.


AAPKNW034E—Link expired

End user 
Request the link again.

Administrator  
Adjust the link expiration configuration if needed.


AAPKNW035E—Empty file

End user 
Add content to the file and try again.

Administrator  
The system rejects files with zero bytes.


AAPKNW036E—Unrecognized file type

End user 
Save or export the file as TXT, PDF, DOCX, or PPTX, and try again.

Administrator  
Verify that the file is not corrupted and that the file extension matches the content.


AAPKNW037E—Unexpected publish error

End user 
Try again. If the issue continues, contact your administrator.

Administrator  
Search logs using the asset ID. Verify all dependent services such as embedding, OCR, vector database, and storage.


AAPKNW038E—Document not supported

End user 
Use a supported file format.

Administrator  
Verify file format and review document-processing logs.


AAPKNW040E—Unpublish failed

End user 
Try again. If the issue continues, contact your administrator.

Administrator  
Search logs using the asset ID. Verify database, storage, and workflow services.


AAPKNW041E—Delete failed

End user 
Try again. If the issue continues, contact your administrator.

Administrator  
Search logs using the asset ID. Verify database, storage, and workflow services.


AAPKNW042E—Cancel publish failed

End user 
Try again. If the issue continues, contact your administrator.

Administrator  
Search logs using the asset ID. Verify database, storage, and workflow services.


AAPKNW043E—AI visibility update failed

End user 
Try updating the visibility again.

Administrator  
Verify vector database and access control configuration.

BMC AMI AI OCR service—Troubleshooting

This topic helps you identify and resolve issues in the BMC AMI AI OCR service.

You can use the error code in the API response to locate the matching troubleshooting entry.

Error codes quick lookup

Error codeMessageHTTP status
AAPOCR001E The OCR operation failed. 500
AAPOCR002E The input file was not found. 404

Troubleshooting entries

Error codeIssueEnd userAdministrator

AAPOCR002E

The input file was not found Verify that the file name matches the uploaded file and includes the .pdf extension. Verify that the file exists in the input directory. Check the DATA_DIR_PATH and volume mounts.

AAPOCR001E

The OCR operation failed Try a different file or retry the requestCheck service logs using the file name and timestamp. Review the stderr output for OCR errors.
HTTP 504 The request timed out Try a smaller or shorter documentReview OCR_SERVICE_TIMEOUT_S and increase it if needed. Check the logs for OCR Process Timed Out.
HTTP 422 The request is invalid Verify that the request includes a valid file_name and a boolean force_ocr value. Verify API input validation and request payload structure.
HTTP 200 (ocr: false) OCR was skipped because text exists No action is required unless OCR is needed. Set the force_ocr to true. This behavior is expected.
HTTP 200 (backend failure) The OCR backend failed Contact your administratorCheck logs for exit code 7 and stderr output. Investigate OCR engine errors.

Validation messages

FieldMessage
file_name The file name is required.
force_ocr The force_ocr value must be a boolean.

Log-based troubleshooting

Log messageAction
OCR Process Timed Out Increase the OCR_SERVICE_TIMEOUT_S and review the document size and system resources.
exit code 6The file already contains text. No action is required.
exit code 7 Review stderr output and investigate OCR engine errors.
exit code 2 (DigitalSignatureError) The PDF contains digital signatures. Review stderr output and retry with a supported file.
exit code NReview stderr output and OCR engine documentation.
File not found Verify the DATA_DIR_PATH file location, and permissions.

Resource and performance issues

IssueAction
The OCR service runs out of memory Increase the memory limit in the deployment configuration or use smaller PDF files.

Administrator checklist

CheckHow to verify
The service is running Call GET /health and verify that it returns 200.
The log file is available Verify the {LOG_DIR_PATH}/{LOG_FILE_NAME}.log.
The input directory exists Verify the DATA_DIR_PATH and list files.
The output directory is writable Verify the OCR_WRITE_PATH permissions.
The timeout configuration is correct Review the OCR_SERVICE_TIMEOUT_S configuration.

Quick log search

IssueSearch for
Timeout OCR Process Timed Out
File not found not found
OCR failure AAPOCR001E
Backend failure exit code 7
Already processed exit code 6
Error details Stderr:

Support

If the issue persists, collect the following information:

  • The error code and HTTP status
  • The file name and timestamp
  • Relevant log entries
  • Environment details

Contact your BMC AMI Platform administrator or support team with this information.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC AMI Platform 2.2