Skip to content

GPU not detected inside container (NVML "Driver/library version mismatch" error) #929

@jaimeehh

Description

@jaimeehh

Hello,

I am facing an issue where my host machine detects the GPU correctly, but inside the Docker container, I get the following error when running nvidia-smi:

Failed to initialize NVML: Driver/library version mismatch

I have tried multiple configurations and different CUDA base images, but I can't resolve this issue. I believe the problem is related to library version conflicts between the host and the container.

Host System (Outside Docker)

  • OS: Ubuntu 22
  • GPU: NVIDIA GeForce GTX TITAN Black (GK110B, Compute Capability 3.5)
  • Driver: 470.256.02
  • CUDA Version (from nvidia-smi): 11.4 (Host driver supports up to CUDA 12)
  • Docker Version: 27.3.1
  • NVIDIA Container Toolkit Installed: Yes, version 1.17.4-1

Container Configuration

  • Base Image Used: (I have tried multiple)
    • nvidia/cuda:10.2-runtime-ubuntu18.04
    • nvidia/cuda:10.2-base-ubuntu18.04
    • nvidia/cuda:10.2-runtime
  • Container OS: Ubuntu 18.04
  • CUDA Version inside container: 10.2
  • NVIDIA Container Toolkit Installed: Yes
  • Run command:
    sudo docker run --gpus all -it --name my_container -v /home/user/my_project:/workspace my_niftypet_runtime

Debugging Attempts

  1. Verified NVIDIA Container Toolkit is installed on the host

    dpkg -l | grep nvidia-container

    Output:

    ii  libnvidia-container-tools                  1.17.4-1
    ii  libnvidia-container1:amd64                 1.17.4-1
    ii  nvidia-container-toolkit                   1.17.4-1
    ii  nvidia-container-toolkit-base              1.17.4-1
    
  2. Tried forcing the container to use host libraries:

    • Running the container with:
      sudo docker run --gpus all -it --env LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu my_niftypet_runtime
    • Still getting the NVML error.
  3. Tried other images:

    • I attempted using nvidia/cuda:10.2.89-base-ubuntu18.04, but it seems unavailable on Docker Hub.

Questions & Help Needed

  • How can I ensure that the container correctly uses the host’s NVIDIA libraries to avoid the Driver/library version mismatch error?
  • Is there any specific Docker image or configuration recommended for older GPUs like the GTX TITAN Black that require CUDA 10.2?
  • Could my Docker version (27.3.1) or NVIDIA Container Toolkit version (1.17.4-1) be incompatible with my setup?

This issue is blocking my work¡. Any help would be greatly appreciated!

Thank you in advance! 😊

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions