Deploying LLM


You can deploy the BMC-provided LLM by using one of the following methods:

  • Using the LLM library (UI)—This is the recommended method. For more information, see LLM library.
  • Manual deployment—We do not recommend this method unless automated deployment isn't possible. Manual deployment involves multiple manual steps. 

If you have an existing LLM that you want to integrate into BMC AMI Platform, then use the following method:

  • Through BYOLLM: To integrate a large language model (LLM) hosted in your environment into BMC AMI Platform, whether self-hosted via vLLM or provided via the OpenAI service. For more information, see Bring your own LLM.

Before deploying LLM you must configure kubectl access and environment for RKE2 Kubernetes cluster. 

To configure kubectl access and environment for RKE2 Kubernetes cluster

Ensure you are logged in to the Kubernetes master node as a root user and run the following commands in sequence.

cat <<'EOF' >/etc/profile.d/kubectl.sh
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
export PATH=$PATH:/var/lib/rancher/rke2/bin
EOF
chmod 755 /etc/profile.d/kubectl.sh
chown root:root /etc/rancher/rke2/rke2.yaml
chmod 600 /etc/rancher/rke2/rke2.yaml
if grep -q '^PermitUserEnvironment' /etc/ssh/sshd_config; then
sed -i 's/^PermitUserEnvironment.*/PermitUserEnvironment yes/' /etc/ssh/sshd_config
else
echo 'PermitUserEnvironment yes' >> /etc/ssh/sshd_config
fi
mkdir -p /root/.ssh
cat <<'EOF' >/root/.ssh/environment
KUBECONFIG=/etc/rancher/rke2/rke2.yaml
PATH=/var/lib/rancher/rke2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
EOF
chown root:root /root/.ssh/environment
chmod 600 /root/.ssh/environment
systemctl restart ssh

To verify the configuration

To verify the configuration is successful, run the following command.

ssh -i <pem-path> root@<host> "kubectl get nodes"

Deploying LLM manually

The platform supports three LLMs that can be deployed independently:

  • Llama 3.1
  • Mixtral
  • Granite 3.1 

You can also deploy a second LLM service on BMC AMI AI Platform. For more information, see To add a second LLM deployment.

To deploy the Llama model on Kubernetes

  1. Log in to the primary manager node of Kubernetes and access /<extracted_dir>/BMC-AMI-PLATFORM-2.0.00.
  2. Verify that the scripts/llama.sh file is present.
  3. Run the llama.sh file and provide the following input:
    FieldDescription

    GPU node

    Select the GPU from the list
    CPU KVCache space Specify the memory allocated for the LLM KV cache. Higher values allow more parallel requests. For more information, see CPU KVCache value.
    Number of GPUs

    The value can be found by running the following command on GPU node.

    nvidia-smi --query-gpu=name --format=csv,noheader | wc -l

To verify the deployment

Verify that the service and pod are running successfully under the namespaces by using the following commands:

 kubectl get po --namespace bmcami-prod-amiai-services
kubectl get pods -n  bmcami-prod-amiai-services

kubectl get pods --namespace bmcami-prod-amiai-services
NAME                           READY   STATUS      RESTARTS      AGE
assistant-c6dd4bb6b-4xbkc      1/1     Running     0             22h
discovery-7c68bcd776-chnz6     1/1     Running     1 (22h ago)   22h
docs-expert-5957c5d845-cn2hg   1/1     Running     0             22h
download-embeddings-qfpgl      0/1     Completed   0             22h
download-expert-model-nw9km    0/1     Completed   0             22h
download-llama-model-582bh     0/1     Completed   0             4h53m
gateway-775f4476d9-h6jk6       1/1     Running     0             5h10m
llama-gpu-6f8c675c4b-j7vhm     1/1     Running     0             4h53m
load-embeddings-4pg6r          1/1     Running     0             22h
platform-75c4997dc5-fk8fq      1/1     Running     0             22h
security-65c8c568db-gqsks      1/1     Running     0             22h

To deploy the Mixtral model on Kubernetes

  1. Log in to the primary manager node of Kubernetes and access /<extracted_dir>/BMC-AMI-PLATFORM-2.0.00.
  2. Verify that the scripts/mixtral.sh file is present.
  3. Run the mixtral.sh file and provide the following input:
    FieldDescription

    GPU node

    Select the GPU from the list
    CPU KVCache space Specify the memory allocated for the LLM KV cache. Higher values allow more parallel requests. For more information, see CPU KVCache value.
    Number of GPUs

    The value can be found by running the following command on GPU node.

    nvidia-smi --query-gpu=name --format=csv,noheader | wc -l

To verify the deployment

Verify that the service and pod are running successfully under the namespaces by using the following command:

 kubectl get po --namespace bmcami-prod-amiai-services

To deploy the Granite model on Kubernetes

  1. Log in to the primary manager node of Kubernetes and access /<extracted_dir>/BMC-AMI-PLATFORM-2.0.00.
  2. Verify that the scripts/granite.sh file is present.
  3. Run the granite.sh file and provide the following input:
    FieldDescription

    GPU node

    Select the GPU from the list
    CPU KVCache space Specify the memory allocated for the LLM KV cache. Higher values allow more parallel requests. For more information, see CPU KVCache value.
    Number of GPUs

    The value can be found by running the following command on GPU node.

    nvidia-smi --query-gpu=name --format=csv,noheader | wc -l

To verify the deployment

Verify that the service and pod are running successfully under the namespaces by using the following command:

 kubectl get po --namespace bmcami-prod-amiai-services

To add a second LLM deployment

The following instructions describe how to deploy a second LLM service on BMC AMI Platform.

Before you begin

To add new node to cluster

  1. Add your node to the Kubernetes cluster.
    Warning
    Important

    You must set up the cluster, including adding nodes. Make sure you add all required nodes to the cluster according to the deployment prerequisites.

  2. Mount the existing NFS on the newly created node.
  3. Verify the node exists in the Kubernetes cluster by running the following command: 
    kubectl get nodes

To add a label to the node

Label the node using the following command:

kubectl label nodes <new-node-name> gpu=true

To run the script file to deploy the model

Navigate to the Helm chart directory and run the appropriate script:

Path: cd <extracted Zip file path>/scripts/

./llama.sh      # For Llama 3.1
./granite.sh    # For Granite 3.1
./mixtral.sh    # For Mixtral
Warning
Important

BMC AMI Assistant and the integrations are governed by BMC’s AI Terms of Use.

 

Where to go from here

After you deploy LLM you can proceed to the following topics:

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC AMI Platform 2.0