Skip to content

Unable to use CUDA Versions 11 when host has CUDA 10.2 Installed #214

@zyh3826

Description

@zyh3826

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:


1. Issue or feature description

system env:

centos7
NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2
nvidia-ctk 1.13.3 commit: c5a93b8d7063a8b1a04872a4e46d449e788ca4de

2. Steps to reproduce the issue

just docker run --rm --gpus all nvidia/cuda:cuda11.8.0-cudnn8-devel-ubuntu20.04

3. Information to attach (optional if deemed irrelevant)

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.8, please update your driver to a newer version, or use an earlier cuda container: unknown.

I was wondering that nvidia-docker is to inject the all of NVIDIA driver libs from the host into the container, why this happened.
Please help me, thanks a lot

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --

I0706 07:02:40.035096 2579104 nvc.c:376] initializing library context (version=1.13.3, build=f21fbe1a5f831936aab2796ebd08f5fb6d6c2df3)
I0706 07:02:40.035733 2579104 nvc.c:350] using root /
I0706 07:02:40.035779 2579104 nvc.c:351] using ldcache /etc/ld.so.cache
I0706 07:02:40.035841 2579104 nvc.c:352] using unprivileged user 3021:3021
I0706 07:02:40.036214 2579104 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0706 07:02:40.036681 2579104 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W0706 07:02:40.053457 2579105 nvc.c:273] failed to set inheritable capabilities
W0706 07:02:40.053780 2579105 nvc.c:274] skipping kernel modules load due to failure
I0706 07:02:40.056131 2579106 rpc.c:71] starting driver rpc service
I0706 07:02:43.476112 2579120 rpc.c:71] starting nvcgo rpc service
I0706 07:02:43.480061 2579104 nvc_info.c:797] requesting driver information with ''
I0706 07:02:43.485304 2579104 nvc_info.c:175] selecting /usr/lib64/vdpau/libvdpau_nvidia.so.440.33.01
I0706 07:02:43.485960 2579104 nvc_info.c:175] selecting /usr/lib64/libnvoptix.so.440.33.01
I0706 07:02:43.486246 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-tls.so.440.33.01
I0706 07:02:43.486447 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-rtcore.so.440.33.01
I0706 07:02:43.486638 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.440.33.01
I0706 07:02:43.486889 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-opticalflow.so.440.33.01
I0706 07:02:43.487146 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-opencl.so.440.33.01
I0706 07:02:43.487313 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-ml.so.440.33.01
I0706 07:02:43.487554 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-ifr.so.440.33.01
I0706 07:02:43.487801 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-glvkspirv.so.440.33.01
I0706 07:02:43.487984 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-glsi.so.440.33.01
I0706 07:02:43.488155 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-glcore.so.440.33.01
I0706 07:02:43.488333 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-fbc.so.440.33.01
I0706 07:02:43.488567 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-fatbinaryloader.so.440.33.01
I0706 07:02:43.488734 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-encode.so.440.33.01
I0706 07:02:43.488976 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-eglcore.so.440.33.01
I0706 07:02:43.489154 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-compiler.so.440.33.01
I0706 07:02:43.489331 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-cfg.so.440.33.01
I0706 07:02:43.489566 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-cbl.so.440.33.01
I0706 07:02:43.489735 2579104 nvc_info.c:175] selecting /usr/lib64/libnvidia-allocator.so.440.33.01
I0706 07:02:43.489984 2579104 nvc_info.c:175] selecting /usr/lib64/libnvcuvid.so.440.33.01
I0706 07:02:43.490789 2579104 nvc_info.c:175] selecting /usr/lib64/libcuda.so.440.33.01
I0706 07:02:43.491246 2579104 nvc_info.c:175] selecting /usr/lib64/libGLX_nvidia.so.440.33.01
I0706 07:02:43.491422 2579104 nvc_info.c:175] selecting /usr/lib64/libGLESv2_nvidia.so.440.33.01
I0706 07:02:43.491591 2579104 nvc_info.c:175] selecting /usr/lib64/libGLESv1_CM_nvidia.so.440.33.01
I0706 07:02:43.491764 2579104 nvc_info.c:175] selecting /usr/lib64/libEGL_nvidia.so.440.33.01
I0706 07:02:43.491977 2579104 nvc_info.c:175] selecting /usr/lib/vdpau/libvdpau_nvidia.so.440.33.01
I0706 07:02:43.492205 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-tls.so.440.33.01
I0706 07:02:43.492388 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-ptxjitcompiler.so.440.33.01
I0706 07:02:43.492636 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-opticalflow.so.440.33.01
I0706 07:02:43.492877 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-opencl.so.440.33.01
I0706 07:02:43.493060 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-ml.so.440.33.01
I0706 07:02:43.493297 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-ifr.so.440.33.01
I0706 07:02:43.493532 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-glvkspirv.so.440.33.01
I0706 07:02:43.493704 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-glsi.so.440.33.01
I0706 07:02:43.493868 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-glcore.so.440.33.01
I0706 07:02:43.494052 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-fbc.so.440.33.01
I0706 07:02:43.494281 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-fatbinaryloader.so.440.33.01
I0706 07:02:43.494444 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-encode.so.440.33.01
I0706 07:02:43.494674 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-eglcore.so.440.33.01
I0706 07:02:43.494846 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-compiler.so.440.33.01
I0706 07:02:43.495033 2579104 nvc_info.c:175] selecting /usr/lib/libnvidia-allocator.so.440.33.01
I0706 07:02:43.495275 2579104 nvc_info.c:175] selecting /usr/lib/libnvcuvid.so.440.33.01
I0706 07:02:43.495581 2579104 nvc_info.c:175] selecting /usr/lib/libcuda.so.440.33.01
I0706 07:02:43.495867 2579104 nvc_info.c:175] selecting /usr/lib/libGLX_nvidia.so.440.33.01
I0706 07:02:43.496049 2579104 nvc_info.c:175] selecting /usr/lib/libGLESv2_nvidia.so.440.33.01
I0706 07:02:43.496220 2579104 nvc_info.c:175] selecting /usr/lib/libGLESv1_CM_nvidia.so.440.33.01
I0706 07:02:43.496393 2579104 nvc_info.c:175] selecting /usr/lib/libEGL_nvidia.so.440.33.01
W0706 07:02:43.496483 2579104 nvc_info.c:401] missing library libnvidia-nscq.so
W0706 07:02:43.496529 2579104 nvc_info.c:401] missing library libcudadebugger.so
W0706 07:02:43.496571 2579104 nvc_info.c:401] missing library libnvidia-pkcs11.so
W0706 07:02:43.496615 2579104 nvc_info.c:401] missing library libnvidia-pkcs11-openssl3.so
W0706 07:02:43.496662 2579104 nvc_info.c:401] missing library libnvidia-nvvm.so
W0706 07:02:43.496704 2579104 nvc_info.c:401] missing library libnvidia-ngx.so
W0706 07:02:43.496746 2579104 nvc_info.c:405] missing compat32 library libnvidia-cfg.so
W0706 07:02:43.496796 2579104 nvc_info.c:405] missing compat32 library libnvidia-nscq.so
W0706 07:02:43.496837 2579104 nvc_info.c:405] missing compat32 library libcudadebugger.so
W0706 07:02:43.496884 2579104 nvc_info.c:405] missing compat32 library libnvidia-pkcs11.so
W0706 07:02:43.496926 2579104 nvc_info.c:405] missing compat32 library libnvidia-pkcs11-openssl3.so
W0706 07:02:43.496982 2579104 nvc_info.c:405] missing compat32 library libnvidia-nvvm.so
W0706 07:02:43.497024 2579104 nvc_info.c:405] missing compat32 library libnvidia-ngx.so
W0706 07:02:43.497066 2579104 nvc_info.c:405] missing compat32 library libnvidia-rtcore.so
W0706 07:02:43.497114 2579104 nvc_info.c:405] missing compat32 library libnvoptix.so
W0706 07:02:43.497156 2579104 nvc_info.c:405] missing compat32 library libnvidia-cbl.so
I0706 07:02:43.497696 2579104 nvc_info.c:301] selecting /usr/bin/nvidia-smi
I0706 07:02:43.497818 2579104 nvc_info.c:301] selecting /usr/bin/nvidia-debugdump
I0706 07:02:43.497929 2579104 nvc_info.c:301] selecting /usr/bin/nvidia-persistenced
I0706 07:02:43.498115 2579104 nvc_info.c:301] selecting /usr/bin/nvidia-cuda-mps-control
I0706 07:02:43.498234 2579104 nvc_info.c:301] selecting /usr/bin/nvidia-cuda-mps-server
W0706 07:02:43.499486 2579104 nvc_info.c:427] missing binary nv-fabricmanager
W0706 07:02:43.499586 2579104 nvc_info.c:470] missing firmware path /usr/lib/firmware/nvidia/440.33.01/gsp*.bin
I0706 07:02:43.499671 2579104 nvc_info.c:560] listing device /dev/nvidiactl
I0706 07:02:43.499698 2579104 nvc_info.c:560] listing device /dev/nvidia-uvm
I0706 07:02:43.499725 2579104 nvc_info.c:560] listing device /dev/nvidia-uvm-tools
I0706 07:02:43.499750 2579104 nvc_info.c:560] listing device /dev/nvidia-modeset
W0706 07:02:43.500462 2579104 nvc_info.c:351] missing ipc path /var/run/nvidia-persistenced/socket
W0706 07:02:43.500546 2579104 nvc_info.c:351] missing ipc path /var/run/nvidia-fabricmanager/socket
W0706 07:02:43.500845 2579104 nvc_info.c:351] missing ipc path /tmp/nvidia-mps
I0706 07:02:43.500882 2579104 nvc_info.c:853] requesting device information with ''
I0706 07:02:43.513312 2579104 nvc_info.c:744] listing device /dev/nvidia0 (GPU-8caa2b82-cd80-2418-0aad-0e584b4ed5f7 at 00000000:3d:00.0)
I0706 07:02:43.521320 2579104 nvc_info.c:744] listing device /dev/nvidia1 (GPU-6101ed84-8464-24f7-20bc-3bad1159f75b at 00000000:41:00.0)
I0706 07:02:43.529586 2579104 nvc_info.c:744] listing device /dev/nvidia2 (GPU-c783acba-5bc9-c17b-e708-b6a713621b0b at 00000000:b1:00.0)
I0706 07:02:43.538091 2579104 nvc_info.c:744] listing device /dev/nvidia3 (GPU-8b201437-7709-72f1-868a-a5fd9249d9b9 at 00000000:b5:00.0)
NVRM version:   440.33.01
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          GeForce RTX 2080 Ti
Brand:          GeForce
GPU UUID:       GPU-8caa2b82-cd80-2418-0aad-0e584b4ed5f7
Bus Location:   00000000:3d:00.0
Architecture:   7.5

Device Index:   1
Device Minor:   1
Model:          GeForce RTX 2080 Ti
Brand:          GeForce
GPU UUID:       GPU-6101ed84-8464-24f7-20bc-3bad1159f75b
Bus Location:   00000000:41:00.0
Architecture:   7.5

Device Index:   2
Device Minor:   2
Model:          GeForce RTX 2080 Ti
Brand:          GeForce
GPU UUID:       GPU-c783acba-5bc9-c17b-e708-b6a713621b0b
Bus Location:   00000000:b1:00.0
Architecture:   7.5

Device Index:   3
Device Minor:   3
Model:          GeForce RTX 2080 Ti
Brand:          GeForce
GPU UUID:       GPU-8b201437-7709-72f1-868a-a5fd9249d9b9
Bus Location:   00000000:b5:00.0
Architecture:   7.5
I0706 07:02:43.539337 2579104 nvc.c:434] shutting down library context
I0706 07:02:43.539465 2579120 rpc.c:95] terminating nvcgo rpc service
I0706 07:02:43.540686 2579104 rpc.c:135] nvcgo rpc service terminated successfully
I0706 07:02:43.962700 2579106 rpc.c:95] terminating driver rpc service
I0706 07:02:43.962890 2579104 rpc.c:135] driver rpc service terminated successfully
  • Kernel version from uname -a
Linux ai184 3.10.0-1160.el7.x86_64 NVIDIA/nvidia-docker#1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Any relevant kernel output lines from dmesg
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.el7.x86_64 root=UUID=2b559db5-ea31-4f09-b78b-c01284fee685 ro crashkernel=auto rhgb quiet
[    0.000000] Reserving 176MB of memory at 608MB for crashkernel (System RAM: 261789MB)
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.el7.x86_64 root=UUID=2b559db5-ea31-4f09-b78b-c01284fee685 ro crashkernel=auto rhgb quiet
[    0.000000] Memory: 5533732k/270532608k available (7788k kernel code, 2460412k absent, 4621948k reserved, 5954k data, 1984k init)
[    0.000000] x86/pti: Unmapping kernel while in userspace
[    0.000000] Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    0.094921] Spectre V2 : Mitigation: IBRS (kernel)
[    0.439208] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    0.439211] TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
[    3.696098] Loaded X.509 cert 'CentOS Linux kernel signing key: e1fdb0e2a7e861a1d1ca80a23dcf0dba3aa4adf5'
[    3.705844] BERT: Boot Error Record Table support is disabled. Enable it by using bert_enable as kernel parameter.
[    3.707394] Freeing unused kernel memory: 1984k freed
[    3.708190] Write protecting the kernel read-only data: 12288k
[    3.710256] Freeing unused kernel memory: 392k freed
[    3.712692] Freeing unused kernel memory: 536k freed
[    3.885703] systemd[1]: Starting Create list of required static device nodes for the current kernel...
[    3.891116] systemd[1]: Started Create list of required static device nodes for the current kernel.
[    4.473966] [TTM] Zone  kernel: Available graphics memory: 131799676 kiB
[   16.458749] nvidia: loading out-of-tree module taints kernel.
[   16.458760] nvidia: module license 'NVIDIA' taints kernel.
[   16.458762] Disabling lock debugging due to kernel taint
[   16.576377] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[22429.972701] perf: interrupt took too long (2530 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[51878.620788] perf: interrupt took too long (3186 > 3162), lowering kernel.perf_event_max_sample_rate to 62000
[70787.553264] perf: interrupt took too long (3998 > 3982), lowering kernel.perf_event_max_sample_rate to 50000
[114608.867467] perf: interrupt took too long (4998 > 4997), lowering kernel.perf_event_max_sample_rate to 40000
[187099.384473] perf: interrupt took too long (6251 > 6247), lowering kernel.perf_event_max_sample_rate to 31000
[278468.371783] perf: interrupt took too long (7821 > 7813), lowering kernel.perf_event_max_sample_rate to 25000
  • Driver information from nvidia-smi -a
    really long, is ask I will offer
  • Docker version from docker version
    20.10.7
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
libnvidia-container1-1.13.3-1.x86_64
nvidia-container-toolkit-1.13.3-1.x86_64
nvidia-container-toolkit-base-1.13.3-1.x86_64
libnvidia-container-tools-1.13.3-1.x86_64
nvidia-docker2-2.13.0-1.noarch
  • NVIDIA container library version from nvidia-container-cli -V
cli-version: 1.13.3
lib-version: 1.13.3
build date: 2023-06-27T18:49+0000
build revision: f21fbe1a5f831936aab2796ebd08f5fb6d6c2df3
build compiler: gcc 4.8.5 20150623 (Red Hat 4.8.5-44)
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • NVIDIA container library logs (see troubleshooting)
  • Docker command, image and tag used

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions