-
Notifications
You must be signed in to change notification settings - Fork 435
Closed
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
Hello,
I am facing an issue where my host machine detects the GPU correctly, but inside the Docker container, I get the following error when running nvidia-smi:
Failed to initialize NVML: Driver/library version mismatch
I have tried multiple configurations and different CUDA base images, but I can't resolve this issue. I believe the problem is related to library version conflicts between the host and the container.
Host System (Outside Docker)
- OS: Ubuntu 22
- GPU: NVIDIA GeForce GTX TITAN Black (GK110B, Compute Capability 3.5)
- Driver: 470.256.02
- CUDA Version (from
nvidia-smi): 11.4 (Host driver supports up to CUDA 12) - Docker Version: 27.3.1
- NVIDIA Container Toolkit Installed: Yes, version 1.17.4-1
Container Configuration
- Base Image Used: (I have tried multiple)
nvidia/cuda:10.2-runtime-ubuntu18.04nvidia/cuda:10.2-base-ubuntu18.04nvidia/cuda:10.2-runtime
- Container OS: Ubuntu 18.04
- CUDA Version inside container: 10.2
- NVIDIA Container Toolkit Installed: Yes
- Run command:
sudo docker run --gpus all -it --name my_container -v /home/user/my_project:/workspace my_niftypet_runtime
Debugging Attempts
-
Verified NVIDIA Container Toolkit is installed on the host
dpkg -l | grep nvidia-containerOutput:
ii libnvidia-container-tools 1.17.4-1 ii libnvidia-container1:amd64 1.17.4-1 ii nvidia-container-toolkit 1.17.4-1 ii nvidia-container-toolkit-base 1.17.4-1 -
Tried forcing the container to use host libraries:
- Running the container with:
sudo docker run --gpus all -it --env LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu my_niftypet_runtime
- Still getting the NVML error.
- Running the container with:
-
Tried other images:
- I attempted using
nvidia/cuda:10.2.89-base-ubuntu18.04, but it seems unavailable on Docker Hub.
- I attempted using
Questions & Help Needed
- How can I ensure that the container correctly uses the host’s NVIDIA libraries to avoid the
Driver/library version mismatcherror? - Is there any specific Docker image or configuration recommended for older GPUs like the GTX TITAN Black that require CUDA 10.2?
- Could my Docker version (27.3.1) or NVIDIA Container Toolkit version (1.17.4-1) be incompatible with my setup?
This issue is blocking my work¡. Any help would be greatly appreciated!
Thank you in advance! 😊
Metadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.