Deploying LLM

Related topic

Deploying BMC AMI Platform

You can deploy the BMC-provided LLM by using one of the following methods:

Using the LLM library (UI)—This is the recommended method. For more information, see LLM library.
Manual deployment—We do not recommend this method unless automated deployment isn't possible. Manual deployment involves multiple manual steps.

If you have an existing LLM that you want to integrate into BMC AMI Platform, then use the following method:

Through BYOLLM: To integrate a large language model (LLM) hosted in your environment into BMC AMI Platform, whether self-hosted via vLLM or provided via the OpenAI service. For more information, see Bring your own LLM.

Before deploying LLM you must configure kubectl access and environment for RKE2 Kubernetes cluster.

To configure kubectl access and environment for RKE2 Kubernetes cluster

Ensure you are logged in to the Kubernetes master node as a root user and run the following commands in sequence.

cat <<'EOF' >/etc/profile.d/kubectl.sh
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
export PATH=$PATH:/var/lib/rancher/rke2/bin
EOF

chmod 755 /etc/profile.d/kubectl.sh

chown root:root /etc/rancher/rke2/rke2.yaml

chmod 600 /etc/rancher/rke2/rke2.yaml

if grep -q '^PermitUserEnvironment' /etc/ssh/sshd_config; then
sed -i 's/^PermitUserEnvironment.*/PermitUserEnvironment yes/' /etc/ssh/sshd_config
else
echo 'PermitUserEnvironment yes' >> /etc/ssh/sshd_config
fi

mkdir -p /root/.ssh

cat <<'EOF' >/root/.ssh/environment
KUBECONFIG=/etc/rancher/rke2/rke2.yaml
PATH=/var/lib/rancher/rke2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
EOF

chown root:root /root/.ssh/environment

chmod 600 /root/.ssh/environment

systemctl restart ssh

To verify the configuration

To verify the configuration is successful, run the following command.

ssh -i <pem-path> root@<host> "kubectl get nodes"

Deploying LLM manually

The platform supports three LLMs that can be deployed independently:

Llama 3.1
Mixtral
Granite 3.1

You can also deploy a second LLM service on BMC AMI AI Platform. For more information, see To add a second LLM deployment.

To deploy the Llama model on Kubernetes

Log in to the primary manager node of Kubernetes and access /<extracted_dir>/BMC-AMI-PLATFORM-2.0.00.
Verify that the scripts/llama.sh file is present.

Run the llama.sh file and provide the following input:

Field	Description
GPU node	Select the GPU from the list
CPU KVCache space	Specify the memory allocated for the LLM KV cache. Higher values allow more parallel requests. For more information, see CPU KVCache value.
Number of GPUs	The value can be found by running the following command on GPU node. nvidia-smi --query-gpu=name --format=csv,noheader \| wc -l

To verify the deployment

Verify that the service and pod are running successfully under the namespaces by using the following commands:

kubectl get po --namespace bmcami-prod-amiai-services

kubectl get pods -n bmcami-prod-amiai-services

kubectl get pods --namespace bmcami-prod-amiai-services
NAME                           READY   STATUS      RESTARTS      AGE
assistant-c6dd4bb6b-4xbkc      1/1     Running     0             22h
discovery-7c68bcd776-chnz6     1/1     Running     1 (22h ago)   22h
docs-expert-5957c5d845-cn2hg   1/1     Running     0             22h
download-embeddings-qfpgl      0/1     Completed   0             22h
download-expert-model-nw9km    0/1     Completed   0             22h
download-llama-model-582bh     0/1     Completed   0             4h53m
gateway-775f4476d9-h6jk6       1/1     Running     0             5h10m
llama-gpu-6f8c675c4b-j7vhm     1/1     Running     0             4h53m
load-embeddings-4pg6r          1/1     Running     0             22h
platform-75c4997dc5-fk8fq      1/1     Running     0             22h
security-65c8c568db-gqsks      1/1     Running     0             22h

To deploy the Mixtral model on Kubernetes

Log in to the primary manager node of Kubernetes and access /<extracted_dir>/BMC-AMI-PLATFORM-2.0.00.
Verify that the scripts/mixtral.sh file is present.

Run the mixtral.sh file and provide the following input:

Field	Description
GPU node	Select the GPU from the list
CPU KVCache space	Specify the memory allocated for the LLM KV cache. Higher values allow more parallel requests. For more information, see CPU KVCache value.
Number of GPUs	The value can be found by running the following command on GPU node. nvidia-smi --query-gpu=name --format=csv,noheader \| wc -l

To verify the deployment

Verify that the service and pod are running successfully under the namespaces by using the following command:

kubectl get po --namespace bmcami-prod-amiai-services

To deploy the Granite model on Kubernetes

Log in to the primary manager node of Kubernetes and access /<extracted_dir>/BMC-AMI-PLATFORM-2.0.00.
Verify that the scripts/granite.sh file is present.

Run the granite.sh file and provide the following input:

Field	Description
GPU node	Select the GPU from the list
CPU KVCache space	Specify the memory allocated for the LLM KV cache. Higher values allow more parallel requests. For more information, see CPU KVCache value.
Number of GPUs	The value can be found by running the following command on GPU node. nvidia-smi --query-gpu=name --format=csv,noheader \| wc -l

To verify the deployment

Verify that the service and pod are running successfully under the namespaces by using the following command:

kubectl get po --namespace bmcami-prod-amiai-services

To add a second LLM deployment

The following instructions describe how to deploy a second LLM service on BMC AMI Platform.

Before you begin

Make sure your machine meets the minimum system requirements for LLM deployment. For more information, see LLM GPU.
Make sure the NVIDIA GPU operator is installed. For more information, see To manually install the NVIDIA GPU operator.

To add new node to cluster

Add your node to the Kubernetes cluster.
Warning
Important
You must set up the cluster, including adding nodes. Make sure you add all required nodes to the cluster according to the deployment prerequisites.
Mount the existing NFS on the newly created node.
Verify the node exists in the Kubernetes cluster by running the following command:
kubectl get nodes

To add a label to the node

Label the node using the following command:

kubectl label nodes <new-node-name> gpu=true

To run the script file to deploy the model

Navigate to the Helm chart directory and run the appropriate script:

Path: cd <extracted Zip file path>/scripts/

./llama.sh      # For Llama 3.1
./granite.sh    # For Granite 3.1
./mixtral.sh    # For Mixtral

Important

BMC AMI Assistant and the integrations are governed by BMC’s AI Terms of Use.

Where to go from here

After you deploy LLM you can proceed to the following topics:

Deploying LLM

To configure kubectl access and environment for RKE2 Kubernetes cluster

To verify the configuration

Deploying LLM manually

To deploy the Llama model on Kubernetes

To verify the deployment

To deploy the Mixtral model on Kubernetes

To verify the deployment

To deploy the Granite model on Kubernetes

To verify the deployment

To add a second LLM deployment

Before you begin

To add new node to cluster

To add a label to the node

To run the script file to deploy the model

Where to go from here

BMC AMI Platform 2.0

On this page