Installing dependencies


This topic describes how to install the following dependencies on Red Hat Enterprise Linux (RHEL) and Ubuntu.

Warning
Important

You must install the RKE2 Kubernetes cluster before you start this process.

For more information, see System requirements.

  • Python
  • Ansible
  • Docker engine
  • Helm
  • NVIDIA GPU operator

You must install these dependencies on the master node to preprocess the BMC AMI Platform product.

To install zip and unzip packages on RHEL

  1. Install the zip and unzip packages by using the following command:

    sudo yum install zip unzip

To install zip and unzip packages on Ubuntu

  1. Install the zip and unzip packages by using the following command:

    sudo apt install zip unzip

To install Python and Ansible on RHEL

  1. Install development tools and required dependencies by using the following command:
    sudo dnf groupinstall -y "Development Tools" sudo dnf install -y gcc openssl-devel bzip2-devel libffi-devel zlib-devel
  2. Install Python 3.12 and its associated package manager by using the following command:
    sudo dnf install -y python3.12 python3.12-pip
  3. Confirm the version by using the following command:
    python3 --version
  4. Install Ansible Core using DNF by using the following command:
    sudo dnf install -y ansible-core

To install Python and Ansible on Ubuntu

  1. Update the system and install the required libraries by using the following command:

    sudo apt-get update && sudo apt-get install -y --no-install-recommends \
    build-essential wget curl libssl-dev zlib1g-dev libncurses-dev libbz2-dev \
    libreadline-dev libsqlite3-dev libffi-dev liblzma-dev xz-utils ca-certificates
  2. Install the Python source code by using the following command:

    sudo apt install -y python3.12
  3. Verify installation by using the following command:

    python3 --version
  4. Install Ansible by using the following command:

    sudo apt-get install -y ansible

To install Docker engine on RHEL

  1. Install the Docker repository and packages and then enable the service as follows:
    1.  Set up the Docker CE repo by using the following command:
      sudo dnf -y install yum-utils
      sudo yum-config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.repo
    2.  Install the Docker engine by using the following command:
      sudo dnf -y install docker-ce docker-ce-cli containerd.io
    3. Enable and start the Docker service by using the following command:

      sudo systemctl enable --now docker
    4. Verify installation by using the following command:

      docker --version
      docker info
    5.  To run Docker commands without sudo, run the following command:
      sudo usermod -aG docker $USER

To apply the new group, you must log out and then log in again.

To install Docker engine on Ubuntu

Perform the following steps to add Docker’s official GPG key and repository, install packages, and start the service:

  1. Update and install prerequisites by using the following command:
    sudo apt-get update
    sudo apt-get install -y ca-certificates curl gnupg
  2. Add Docker’s official GPG key by using the following command:
    sudo install -m 0755 -d /etc/apt/keyrings
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
      | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    sudo chmod a+r /etc/apt/keyrings/docker.gpg
  3. Set up the repository (uses your Ubuntu codename automatically) by using the following command:
    echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
      https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo $UBUNTU_CODENAME) stable" \
      | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  4. Install Docker engine by using the following command:
    sudo apt-get update
    sudo apt-get install -y docker-ce docker-ce-cli containerd.io
  5. Enable and start the Docker service by using the following command:
    sudo systemctl enable --now docker
  6.  Verify installation by using the following command:
    docker --version
    docker info
  7. To run Docker commands without sudo, run the following command: 
    sudo usermod -aG docker $USER

To apply the new group, you must log out and then log in again.

To install Helm on RHEL

  1. Add Helm repo by using the following command:

    sudo tee /etc/yum.repos.d/helm.repo >/dev/null <<'EOF'
    [helm]
    name=Helm
    baseurl=https://baltocdn.com/helm/stable/rpm
    enabled=1
    gpgcheck=1
    gpgkey=https://baltocdn.com/helm/signing.asc
    EOF
  2. Install and verify by using the following command:

    sudo dnf install -y helm
    helm version

To install Helm on Ubuntu

  1. Add Helm repo by using the following command:
    sudo apt-get update
    sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
    curl -fsSL https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg >/dev/null
    echo "deb [signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" \
    | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list >/dev/null
  2. Install and verify by using the following command:

    sudo apt-get update
    sudo apt-get install -y helm
    helm version

If you encounter issues on AWS machines, use the following command:

snap install helm --classic

To manually install the NVIDIA GPU operator

The BMC AMI Platform deployment script automatically installs the NVIDIA GPU operator. However, if you prefer to install the GPU operator manually or must reinstall, follow these steps:

Step 1: Create HelmChart manifest

Create a file named gpu-operator-helmchart.yaml:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: gpu-operator
  namespace: kube-system
spec:
  repo: https://helm.ngc.nvidia.com/nvidia
  chart: gpu-operator
  targetNamespace: gpu-operator
  createNamespace: true
  valuesContent: |-
    toolkit:
      env:
      - name: CONTAINERD_SOCKET
        value: /run/k3s/containerd/containerd.sock

Step 2: Apply the manifest

  1. Apply the HelmChart manifest:
    kubectl apply -f gpu-operator-helmchart.yaml
  2. Verify that HelmChart was created:
    kubectl get helmchart -n kube-system

Expected Output:

Information
Example
NAME           CHART            REPO                               VERSION    JOBNAME
gpu-operator   gpu-operator     https://helm.ngc.nvidia.com/nvidia            helm-install-gpu-operator

Step 3: Monitor deployment

The RKE2 HelmChart controller will automatically deploy the GPU operator:

  1. Watch HelmChart controller logs:
    kubectl logs -n kube-system -l app=helm-controller -f
  2. Check if the GPU operator namespace was created:
    kubectl get namespace gpu-operator
  3. Monitor the GPU operator pods that were created:
    kubectl get pods -n gpu-operator -w

Step 4: Wait for the pods to be ready

GPU operator deploys multiple components. Wait for all pods to be in Running state:

  1. Check all GPU operator pods:
    kubectl get pods -n gpu-operator
  2.  Wait for all pods to be ready (might take 5-10 minutes):
    kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=gpu-operator -n gpu-operator --timeout=600s

To verify HelmChart resource

  1. Check HelmChart status:
    kubectl get helmchart gpu-operator -n kube-system -o yaml
  2. Look for status conditions:
    kubectl get helmchart gpu-operator -n kube-system -o jsonpath='{.status.jobName}'

To verify GPU operator pods

  1. List all pods in the gpu-operator namespace:
    kubectl get pods -n gpu-operator
  2. Check specific components:

    kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonset
    kubectl get pods -n gpu-operator -l app=nvidia-container-toolkit-daemonset
    kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-daemonset

    An example of the expected output follows:

    NAME                                                  READY   STATUS      RESTARTS   AGE
    gpu-feature-discovery-xxxxx                           1/1     Running     0          5m
    gpu-operator-xxxxx                                    1/1     Running     0          6m
    nvidia-container-toolkit-daemonset-xxxxx              1/1     Running     0          5m
    nvidia-cuda-validator-xxxxx                           0/1     Completed   0          3m
    nvidia-dcgm-exporter-xxxxx                            1/1     Running     0          5m
    nvidia-device-plugin-daemonset-xxxxx                  1/1     Running     0          5m
    nvidia-driver-daemonset-xxxxx                         1/1     Running     0          5m
    nvidia-operator-validator-xxxxx                       1/1     Running     0          5m

    To verify the availability of the GPU resources

    1. Verify the GPUs are visible to Kubernetes:
      kubectl get nodes -o json | jq '.items[].status.capacity | select(."nvidia.com/gpu" != null)'
    2. Describe nodes with GPUs:
      kubectl describe nodes -l gpu=true | grep -A 10 "Capacity:"
    3. Check allocatable GPU resources:
      kubectl get nodes -o custom-columns=NAME:.metadata.name,GPU:.status.allocatable."nvidia\.com/gpu"

Where to go from here

After you complete the installation dependencies process, proceed to Deploying BMC AMI Platform.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC AMI Platform 2.0