Troubleshooting


This topic displays a list of different installation issues and solutions.

Related topic

Issue

Solution

On the Azure virtual machine, NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.

Install the GPU Driver extension on the virtual machine by using the following link: https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-linux

While running the playbook, you get the following error: 

fatal [localhost]: FAILED! => {"changed": false, "msg": "Detected no loaded images. Archive potentially corrupt?", "stdout": "", "stdout_lines": []}

Increase the disk space

While running the playbook, you get the following error: 

fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error creating container: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=6)

Increase the disk space

NVIDIA-SMI failed because it couldn't communicate with the NVIDIA driver. The NVIDIA graphic driver is not compatible.

The issue occurred for one of the following reasons:

  • The NVIDIA drivers are not correctly installed.
  • The machine is not GPU-enabled.
  • The GPU drivers are not correctly installed.

Install the required GPU drivers and retry.

While starting a container, you get the following error:

Click here to see the 500 Server error.

7261c90b8771a4e83b036a0820598e630261fb2ddcc0ecd50d89529fa29b6ad6: 500 Server Error for http+docker://localhost/v1.46/containers/7261c90b8771a4e83b036a0820598e630261fb2ddcc0ecd50d89529fa29b6ad6/start: Internal Server Error (\"could not select device driver \"nvidia\" with capabilities: [[gpu]]\

The issue occurred for one of the following reasons:

  • The machine is not GPU-enabled.
  • The GPU drivers are not correctly installed.

To address the No such file or directory error when running the  BMC-AMI-AI-Platform.shBMC-AMI-AI-Llama.sh or BMC-AMI-AI-Mixtral.sh files.

  1. Verify file presence:
    Run the following command to ensure the script exists in the specified directory: 

    Ls -l

    This will list the files in the directory. Make sure that BMC-AMI-AI-Platform.sh, BMC-AMI-AI-Llama.sh, or BMC-AMI-AI-Mixtral.sh files are listed.

  2. Fix line-ending issues:
    If the file is present but the error persists, it might be because of incorrect line endings (such as Windows-style carriage returns). Run the following command to fix the issue: 

    sed -i -e 's/\r$//' BMC-AMI-AI-Platform.sh OR sed -i -e 's/\r$//' BMC-AMI-AI-Llama.sh OR sed -i -e 's/\r$//' BMC-AMI-AI-Mixtral.sh
  3. Retry execution by running the following command: 

    ./BMC-AMI-AI-Platform.sh OR ./BMC-AMI-AI-Llama.sh OR ./ BMC-AMI-AI-Mixtral.sh

Permission is denied when attempting to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.47/containers/": dial unix /var/run/docker.sock: connect: permission denied.

Prefix the docker command with sudo

Error during Terraform Installation

After installing Terraform, running the Terraform command results in a command not found error.

Possible Cause: The Terraform binary is not added to your system’s PATH environment variable.

  1. Verify Installation Path: Check if Terraform was installed correctly by navigating to the installation directory (default might be C:\Program Files\Terraform on Windows).
  2. Update PATH Environment Variable:
    1. On Windows:
      • Click Control Panel > System > Advanced system settings > Environment Variables.
      • Find the PATH variable and edit it. Add the path where terraform.exe is located. For example, C:\Program Files\Terraform.
    2. On Linux/macOS:
      • Open .bashrc or .zshrc file and add the following line: 

        Bash export PATH="$PATH:/path/to/terraform"

        To reload the shell, run the following command: 

        source ~/.bashrc or source ~/.zshrc

Error during AWS CLI Installation

When you run the following command after installing AWS CLI, it returns command not found

aws --version

Possible Cause: The AWS CLI installation path is not added to the PATH environment variable, or the installation was incomplete.

  1. Verify Installation Path:

      • On Windows, by default, AWS CLI is installed in C:\Program Files\Amazon\AWSCLI. Make sure this directory exists.
      • On macOS or Linux, verify that the CLI is installed by checking /usr/local/bin/aws or by using: 

        which aws
  2. Update the PATH Variable:

      • On Windows:

          • Add C:\Program Files\Amazon\AWSCLI\bin to the PATH in the Environment Variables.
      • On Linux/macOS:

          • Add the following line to your .bashrc or .zshrc file: 

            bash export PATH="$PATH:/usr/local/bin/aws"

            To reload the shell, run the following command: 

            source ~/.bashrc or source ~/.zshrc

Error Unable to Locate Credentials when Using AWS CLI

After installing AWS CLI and running commands, you receive an error such as Unable to locate credentials.

Possible Cause: AWS CLI is not configured with your credentials, or the credentials are invalid.

Configure AWS CLI

  1. Run the following command to configure your AWS credentials: 

    bash aws configure
  2. A prompt message requires you to enter the following:
    • AWS Access Key ID
    • AWS Secret Access Key
    • Default region name
    • Default output format
  3. Check the AWS Credentials File as follows:
    • Make sure your credentials are saved in the correct location.
      • On Windows: C:\Users\USERNAME\.aws\credentials
      •  On Linux/macOS: ~/.aws/credentials

Error Error loading certificate when running OpenSSL commands

Running OpenSSL commands to extract or verify certificates results in the error Error loading certificate.

Possible Cause: The certificate file is corrupt, missing, or not properly formatted.

  1. Verify Certificate Format as follows:
    • Make sure the certificate file is in the correct PEM format and starts with -----BEGIN CERTIFICATE----- and ends with -----END CERTIFICATE-----.
  2. Re-download or Recreate the Certificate as follows:
    • If the file is corrupt, re-download it from the certificate authority or recreate the certificate by running the necessary OpenSSL commands again.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*