Installing dependencies
This topic describes how to install the following dependencies on Red Hat Enterprise Linux (RHEL) and Ubuntu.
- Python
- Ansible
- Docker engine
- Helm
- NVIDIA GPU operator
You must install these dependencies on the master node to preprocess the BMC AMI Platform product.
- To install zip and unzip packages on RHEL
- To install zip and unzip packages on Ubuntu
- To install Python and Ansible on RHEL
- To install Python and Ansible on Ubuntu
- To install Docker engine on RHEL
- To install Docker engine on Ubuntu
- To install Helm on RHEL
- To install Helm on Ubuntu
- To manually install the NVIDIA GPU operator
- Where to go from here
To install zip and unzip packages on RHEL
- Install the zip and unzip packages by using the following command:
sudo yum install zip unzip
To install zip and unzip packages on Ubuntu
- Install the zip and unzip packages by using the following command:
sudo apt install zip unzip
To install Python and Ansible on RHEL
- Install development tools and required dependencies by using the following command:sudo dnf groupinstall -y "Development Tools" sudo dnf install -y gcc openssl-devel bzip2-devel libffi-devel zlib-devel
- Install Python 3.12 and its associated package manager by using the following command:sudo dnf install -y python3.12 python3.12-pip
- Confirm the version by using the following command:python3 --version
- Install Ansible Core using DNF by using the following command:sudo dnf install -y ansible-core
To install Python and Ansible on Ubuntu
Update the system and install the required libraries by using the following command:
sudo apt-get update && sudo apt-get install -y --no-install-recommends \
build-essential wget curl libssl-dev zlib1g-dev libncurses-dev libbz2-dev \
libreadline-dev libsqlite3-dev libffi-dev liblzma-dev xz-utils ca-certificatesInstall the Python source code by using the following command:
sudo apt install -y python3.12Verify installation by using the following command:
python3 --versionInstall Ansible by using the following command:
sudo apt-get install -y ansible
To install Docker engine on RHEL
- Install the Docker repository and packages and then enable the service as follows:
- Set up the Docker CE repo by using the following command:sudo dnf -y install yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.repo - Install the Docker engine by using the following command:sudo dnf -y install docker-ce docker-ce-cli containerd.io
Enable and start the Docker service by using the following command:
sudo systemctl enable --now dockerVerify installation by using the following command:
docker --version
docker info- To run Docker commands without sudo, run the following command:sudo usermod -aG docker $USER
- Set up the Docker CE repo by using the following command:
To apply the new group, you must log out and then log in again.
To install Docker engine on Ubuntu
Perform the following steps to add Docker’s official GPG key and repository, install packages, and start the service:
- Update and install prerequisites by using the following command:sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg - Add Docker’s official GPG key by using the following command:sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg - Set up the repository (uses your Ubuntu codename automatically) by using the following command:echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo $UBUNTU_CODENAME) stable" \
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null - Install Docker engine by using the following command:sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io - Enable and start the Docker service by using the following command:sudo systemctl enable --now docker
- Verify installation by using the following command:docker --version
docker info - To run Docker commands without sudo, run the following command: sudo usermod -aG docker $USER
To apply the new group, you must log out and then log in again.
To install Helm on RHEL
Add Helm repo by using the following command:
sudo tee /etc/yum.repos.d/helm.repo >/dev/null <<'EOF'
[helm]
name=Helm
baseurl=https://baltocdn.com/helm/stable/rpm
enabled=1
gpgcheck=1
gpgkey=https://baltocdn.com/helm/signing.asc
EOFInstall and verify by using the following command:
sudo dnf install -y helm
helm version
To install Helm on Ubuntu
- Add Helm repo by using the following command:sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg >/dev/null
echo "deb [signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" \
| sudo tee /etc/apt/sources.list.d/helm-stable-debian.list >/dev/null Install and verify by using the following command:
sudo apt-get update
sudo apt-get install -y helm
helm version
If you encounter issues on AWS machines, use the following command:
To manually install the NVIDIA GPU operator
The BMC AMI Platform deployment script automatically installs the NVIDIA GPU operator. However, if you prefer to install the GPU operator manually or must reinstall, follow these steps:
Step 1: Create HelmChart manifest
Create a file named gpu-operator-helmchart.yaml:
kind: HelmChart
metadata:
name: gpu-operator
namespace: kube-system
spec:
repo: https://helm.ngc.nvidia.com/nvidia
chart: gpu-operator
targetNamespace: gpu-operator
createNamespace: true
valuesContent: |-
toolkit:
env:
- name: CONTAINERD_SOCKET
value: /run/k3s/containerd/containerd.sock
Step 2: Apply the manifest
- Apply the HelmChart manifest:kubectl apply -f gpu-operator-helmchart.yaml
- Verify that HelmChart was created:kubectl get helmchart -n kube-system
Expected Output:
Step 3: Monitor deployment
The RKE2 HelmChart controller will automatically deploy the GPU operator:
- Watch HelmChart controller logs:kubectl logs -n kube-system -l app=helm-controller -f
- Check if the GPU operator namespace was created:kubectl get namespace gpu-operator
- Monitor the GPU operator pods that were created:kubectl get pods -n gpu-operator -w
Step 4: Wait for the pods to be ready
GPU operator deploys multiple components. Wait for all pods to be in Running state:
- Check all GPU operator pods:kubectl get pods -n gpu-operator
- Wait for all pods to be ready (might take 5-10 minutes):kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=gpu-operator -n gpu-operator --timeout=600s
To verify HelmChart resource
- Check HelmChart status:kubectl get helmchart gpu-operator -n kube-system -o yaml
- Look for status conditions:kubectl get helmchart gpu-operator -n kube-system -o jsonpath='{.status.jobName}'
To verify GPU operator pods
- List all pods in the gpu-operator namespace:kubectl get pods -n gpu-operator
Check specific components:
kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonsetkubectl get pods -n gpu-operator -l app=nvidia-container-toolkit-daemonsetkubectl get pods -n gpu-operator -l app=nvidia-device-plugin-daemonsetAn example of the expected output follows:
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-xxxxx 1/1 Running 0 5m
gpu-operator-xxxxx 1/1 Running 0 6m
nvidia-container-toolkit-daemonset-xxxxx 1/1 Running 0 5m
nvidia-cuda-validator-xxxxx 0/1 Completed 0 3m
nvidia-dcgm-exporter-xxxxx 1/1 Running 0 5m
nvidia-device-plugin-daemonset-xxxxx 1/1 Running 0 5m
nvidia-driver-daemonset-xxxxx 1/1 Running 0 5m
nvidia-operator-validator-xxxxx 1/1 Running 0 5mTo verify the availability of the GPU resources
- Verify the GPUs are visible to Kubernetes:kubectl get nodes -o json | jq '.items[].status.capacity | select(."nvidia.com/gpu" != null)'
- Describe nodes with GPUs:kubectl describe nodes -l gpu=true | grep -A 10 "Capacity:"
- Check allocatable GPU resources:kubectl get nodes -o custom-columns=NAME:.metadata.name,GPU:.status.allocatable."nvidia\.com/gpu"
- Verify the GPUs are visible to Kubernetes:
Where to go from here
After you complete the installation dependencies process, proceed to Deploying BMC AMI Platform.