Skip to content

Error running NVIDIA Container with Docker on Ubuntu 22.04 #1051

@ashish-kumar-hpe

Description

@ashish-kumar-hpe

Hi ,

I installed a new Ubuntu 22.04 Ubuntu machine and performed following actions.

  1. Installed GPU Driver as per the instruction here : https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/#ubuntu
  2. Installed Docker on Ubuntu : https://docs.docker.com/engine/install/ubuntu/
  3. Installed NVIDIA Cuda Driver Toolkit as per following instructions : https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html4.

uname -m && cat /etc/release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.2 LTS"
PRETTY_NAME="Ubuntu 24.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.2 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

nvidia-smi
Fri Apr 25 10:52:11 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S Off | 00000000:59:00.0 Off | 0 |
| N/A 31C P8 24W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S Off | 00000000:C2:00.0 Off | 0 |
| N/A 32C P8 24W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

export PATH=/usr/local/cuda-12.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
root@pcai-grkr1:~#

docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
Unable to find image 'nvidia/cuda:12.8.0-base-ubuntu22.04' locally
12.8.0-base-ubuntu22.04: Pulling from nvidia/cuda
6414378b6477: Pull complete
ad69d3880477: Pull complete
2d01ee89ef0b: Pull complete
7d21de8cade1: Pull complete
4b650590013c: Pull complete
Digest: sha256:12242992c121f6cab0ca11bccbaaf757db893b3065d7db74b933e59f321b2cf4
Status: Downloaded newer image for nvidia/cuda:12.8.0-base-ubuntu22.04
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

cat /var/log/nvidia-container-toolkit.log

-- WARNING, the following logs are for debugging purposes only --

I0425 10:59:51.731803 14273 nvc.c:393] initializing library context (version=1.15.0, build=6c8f1df7fd32cea3280cf2a2c6e931c9b3132465)
I0425 10:59:51.731853 14273 nvc.c:364] using root /
I0425 10:59:51.731859 14273 nvc.c:365] using ldcache /etc/ld.so.cache
I0425 10:59:51.731864 14273 nvc.c:366] using unprivileged user 65534:65534
I0425 10:59:51.731882 14273 nvc.c:410] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0425 10:59:51.732038 14273 nvc.c:412] dxcore initialization failed, continuing assuming a non-WSL environment
I0425 10:59:51.762413 14282 nvc.c:278] loading kernel module nvidia
I0425 10:59:51.762573 14282 nvc.c:282] running mknod for /dev/nvidiactl
I0425 10:59:51.762633 14282 nvc.c:286] running mknod for /dev/nvidia0
I0425 10:59:51.762658 14282 nvc.c:286] running mknod for /dev/nvidia1
I0425 10:59:51.762681 14282 nvc.c:290] running mknod for all nvcaps in /dev/nvidia-caps
I0425 10:59:51.767123 14282 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0425 10:59:51.767210 14282 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0425 10:59:51.768652 14282 nvc.c:301] loading kernel module nvidia_uvm
I0425 10:59:51.768672 14282 nvc.c:305] running mknod for /dev/nvidia-uvm
I0425 10:59:51.768720 14282 nvc.c:310] loading kernel module nvidia_modeset
I0425 10:59:51.768744 14282 nvc.c:314] running mknod for /dev/nvidia-modeset
I0425 10:59:51.769504 14283 rpc.c:71] starting driver rpc service
I0425 10:59:51.769958 14273 rpc.c:132] driver rpc service terminated with signal 15
I0425 10:59:51.770011 14273 nvc.c:452] shutting down library context

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions