Installing dependencies

Related topics

Planning

Deploying BMC AMI Platform

This topic describes how to install the following dependencies on Red Hat Enterprise Linux (RHEL) and Ubuntu.

Important

You must install the RKE2 Kubernetes cluster before you start this process.

For more information, see System requirements.

Python
Ansible
Docker engine
Helm
NVIDIA GPU operator

You must install these dependencies on the master node to preprocess the BMC AMI Platform product.

To install zip and unzip packages on RHEL

Install the zip and unzip packages by using the following command:

sudo yum install zip unzip

To install zip and unzip packages on Ubuntu

Install the zip and unzip packages by using the following command:

sudo apt install zip unzip

To install Python and Ansible on RHEL

Install development tools and required dependencies by using the following command:
sudo dnf groupinstall -y "Development Tools" sudo dnf install -y gcc openssl-devel bzip2-devel libffi-devel zlib-devel
Install Python 3.12 and its associated package manager by using the following command:
sudo dnf install -y python3.12 python3.12-pip
Confirm the version by using the following command:
python3 --version
Install Ansible Core using DNF by using the following command:
sudo dnf install -y ansible-core

To install Python and Ansible on Ubuntu

Update the system and install the required libraries by using the following command:
sudo apt-get update && sudo apt-get install -y --no-install-recommends \
build-essential wget curl libssl-dev zlib1g-dev libncurses-dev libbz2-dev \
libreadline-dev libsqlite3-dev libffi-dev liblzma-dev xz-utils ca-certificates
Install the Python source code by using the following command:
sudo apt install -y python3.12
Verify installation by using the following command:
python3 --version
Install Ansible by using the following command:
sudo apt-get install -y ansible

To install Docker engine on RHEL

Install the Docker repository and packages and then enable the service as follows:
1. Set up the Docker CE repo by using the following command:
  sudo dnf -y install yum-utils
  sudo yum-config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.repo
2. Install the Docker engine by using the following command:
  sudo dnf -y install docker-ce docker-ce-cli containerd.io
3. Enable and start the Docker service by using the following command:
  sudo systemctl enable --now docker
4. Verify installation by using the following command:
  docker --version
  docker info
5. To run Docker commands without sudo, run the following command:
  sudo usermod -aG docker $USER

To apply the new group, you must log out and then log in again.

To install Docker engine on Ubuntu

Perform the following steps to add Docker’s official GPG key and repository, install packages, and start the service:

Update and install prerequisites by using the following command:
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
Add Docker’s official GPG key by using the following command:
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
Set up the repository (uses your Ubuntu codename automatically) by using the following command:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo $UBUNTU_CODENAME) stable" \
  | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Install Docker engine by using the following command:
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
Enable and start the Docker service by using the following command:
sudo systemctl enable --now docker
Verify installation by using the following command:
docker --version
docker info
To run Docker commands without sudo, run the following command:
sudo usermod -aG docker $USER

To apply the new group, you must log out and then log in again.

To install Helm on RHEL

Add Helm repo by using the following command:
sudo tee /etc/yum.repos.d/helm.repo >/dev/null <<'EOF'
[helm]
name=Helm
baseurl=https://baltocdn.com/helm/stable/rpm
enabled=1
gpgcheck=1
gpgkey=https://baltocdn.com/helm/signing.asc
EOF
Install and verify by using the following command:
sudo dnf install -y helm
helm version

To install Helm on Ubuntu

Add Helm repo by using the following command:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg >/dev/null
echo "deb [signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" \
| sudo tee /etc/apt/sources.list.d/helm-stable-debian.list >/dev/null
Install and verify by using the following command:
sudo apt-get update
sudo apt-get install -y helm
helm version

If you encounter issues on AWS machines, use the following command:

snap install helm --classic

To manually install the NVIDIA GPU operator

The BMC AMI Platform deployment script automatically installs the NVIDIA GPU operator. However, if you prefer to install the GPU operator manually or must reinstall, follow these steps:

Step 1: Create HelmChart manifest

Create a file named gpu-operator-helmchart.yaml:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: gpu-operator
  namespace: kube-system
spec:
  repo: https://helm.ngc.nvidia.com/nvidia
  chart: gpu-operator
  targetNamespace: gpu-operator
  createNamespace: true
  valuesContent: |-
    toolkit:
      env:
      - name: CONTAINERD_SOCKET
        value: /run/k3s/containerd/containerd.sock

Step 2: Apply the manifest

Apply the HelmChart manifest:
kubectl apply -f gpu-operator-helmchart.yaml
Verify that HelmChart was created:
kubectl get helmchart -n kube-system

Expected Output:

Example

NAME CHART REPO VERSION JOBNAME
gpu-operator gpu-operator https://helm.ngc.nvidia.com/nvidia helm-install-gpu-operator

Step 3: Monitor deployment

The RKE2 HelmChart controller will automatically deploy the GPU operator:

Watch HelmChart controller logs:
kubectl logs -n kube-system -l app=helm-controller -f
Check if the GPU operator namespace was created:
kubectl get namespace gpu-operator
Monitor the GPU operator pods that were created:
kubectl get pods -n gpu-operator -w

Step 4: Wait for the pods to be ready

GPU operator deploys multiple components. Wait for all pods to be in Running state:

Check all GPU operator pods:
kubectl get pods -n gpu-operator
Wait for all pods to be ready (might take 5-10 minutes):
kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=gpu-operator -n gpu-operator --timeout=600s

To verify HelmChart resource

Check HelmChart status:
kubectl get helmchart gpu-operator -n kube-system -o yaml
Look for status conditions:
kubectl get helmchart gpu-operator -n kube-system -o jsonpath='{.status.jobName}'

To verify GPU operator pods

List all pods in the gpu-operator namespace:
kubectl get pods -n gpu-operator
Check specific components:
kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonset
kubectl get pods -n gpu-operator -l app=nvidia-container-toolkit-daemonset
kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-daemonset
An example of the expected output follows:
NAME                                                  READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-xxxxx                           1/1     Running     0          5m
gpu-operator-xxxxx                                    1/1     Running     0          6m
nvidia-container-toolkit-daemonset-xxxxx              1/1     Running     0          5m
nvidia-cuda-validator-xxxxx                           0/1     Completed   0          3m
nvidia-dcgm-exporter-xxxxx                            1/1     Running     0          5m
nvidia-device-plugin-daemonset-xxxxx                  1/1     Running     0          5m
nvidia-driver-daemonset-xxxxx                         1/1     Running     0          5m
nvidia-operator-validator-xxxxx                       1/1     Running     0          5m
To verify the availability of the GPU resources
1. Verify the GPUs are visible to Kubernetes:
  kubectl get nodes -o json | jq '.items[].status.capacity | select(."nvidia.com/gpu" != null)'
2. Describe nodes with GPUs:
  kubectl describe nodes -l gpu=true | grep -A 10 "Capacity:"
3. Check allocatable GPU resources:
  kubectl get nodes -o custom-columns=NAME:.metadata.name,GPU:.status.allocatable."nvidia\.com/gpu"

Where to go from here

After you complete the installation dependencies process, proceed to Deploying BMC AMI Platform.