Skip to content

nvidia-container-cli: initialization error: driver rpc error: failed to process request: unknown. #556

@zhguli1200

Description

@zhguli1200

Hi there.

1. Issue or feature description

Getting the following error when I Running a Sample Workload

# command:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
# error output:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: failed to process request: unknown.

2. Steps to reproduce the issue

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Then the error reproduced:

# error output:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: failed to process request: unknown.

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0621 02:29:23.330189 9108 nvc.c:393] initializing library context (version=1.15.0, build=6c8f1df7fd32cea3280cf2a2c6e931c9b3132465)
I0621 02:29:23.330230 9108 nvc.c:364] using root /
I0621 02:29:23.330234 9108 nvc.c:365] using ldcache /etc/ld.so.cache
I0621 02:29:23.330239 9108 nvc.c:366] using unprivileged user 1000:1000
I0621 02:29:23.330266 9108 nvc.c:410] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0621 02:29:23.330403 9108 nvc.c:412] dxcore initialization failed, continuing assuming a non-WSL environment
W0621 02:29:23.359870 9109 nvc.c:273] failed to set inheritable capabilities
W0621 02:29:23.359945 9109 nvc.c:274] skipping kernel modules load due to failure
I0621 02:29:23.360866 9110 rpc.c:71] starting driver rpc service
I0621 02:29:23.362883 9108 rpc.c:135] driver rpc service terminated successfully
nvidia-container-cli: initialization error: driver rpc error: failed to process request
I0621 02:29:23.362972 9108 nvc.c:452] shutting down library context
  • Kernel version from uname -a
Linux SHJS-PF4WFQ33 6.5.0-1024-oem #25-Ubuntu SMP PREEMPT_DYNAMIC Mon May 20 14:47:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • Docker version from docker version
Client: Docker Engine - Community
 Version:           20.10.13
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 10 14:07:47 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.13
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       906f57f
  Built:            Thu Mar 10 14:05:38 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.33
  GitCommit:        d2d58213f83a351ca8f528a95fbd145f5654e957
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  • NVIDIA container library version from nvidia-container-cli -V
cli-version: 1.15.0
lib-version: 1.15.0
build date: 2024-04-15T13:36+00:00
build revision: 6c8f1df7fd32cea3280cf2a2c6e931c9b3132465
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • Any relevant kernel output lines from dmesg
    dmesg.log

  • Driver information from nvidia-smi -a

==============NVSMI LOG==============

Timestamp                                 : Fri Jun 21 10:56:30 2024
Driver Version                            : 535.171.04
CUDA Version                              : 12.2

Attached GPUs                             : 1
GPU 00000000:01:00.0
    Product Name                          : NVIDIA RTX 2000 Ada Generation Laptop GPU
    Product Brand                         : NVIDIA RTX
    Product Architecture                  : Ada Lovelace
    Display Mode                          : Enabled
    Display Active                        : Enabled
    Persistence Mode                      : Disabled
    Addressing Mode                       : None
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-4f80a2a3-a0a3-cf61-ef3f-5ca4cb3b6888
    Minor Number                          : 0
    VBIOS Version                         : 95.07.28.00.68
    MultiGPU Board                        : No
    Board ID                              : 0x100
    Board Part Number                     : N/A
    GPU Part Number                       : 28B8-975-A1
    FRU Part Number                       : N/A
    Module ID                             : 1
    Inforom Version
        Image Version                     : G002.0000.00.03
        OEM Object                        : 2.0
        ECC Object                        : N/A
        Power Management Object           : N/A
    Inforom BBX Object Flush
        Latest Timestamp                  : N/A
        Latest Duration                   : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    GPU Reset Status
        Reset Required                    : No
        Drain and Reset Recommended       : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x01
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x28B810DE
        Bus Id                            : 00000000:01:00.0
        Sub System Id                     : 0x230617AA
        GPU Link Info
            PCIe Generation
                Max                       : 4
                Current                   : 1
                Device Current            : 1
                Device Max                : 4
                Host Max                  : 5
            Link Width
                Max                       : 8x
                Current                   : 8x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 85000 KB/s
        Rx Throughput                     : 786000 KB/s
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : N/A
    Performance State                     : P8
    Clocks Event Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    Sparse Operation Mode                 : N/A
    FB Memory Usage
        Total                             : 8188 MiB
        Reserved                          : 247 MiB
        Used                              : 109 MiB
        Free                              : 7831 MiB
    BAR1 Memory Usage
        Total                             : 8192 MiB
        Used                              : 3 MiB
        Free                              : 8189 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 21 %
        Memory                            : 6 %
        Encoder                           : 0 %
        Decoder                           : 0 %
        JPEG                              : 0 %
        OFA                               : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    ECC Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable Parity     : N/A
            SRAM Uncorrectable SEC-DED    : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable Parity     : N/A
            SRAM Uncorrectable SEC-DED    : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
            SRAM Threshold Exceeded       : N/A
        Aggregate Uncorrectable SRAM Sources
            SRAM L2                       : N/A
            SRAM SM                       : N/A
            SRAM Microcontroller          : N/A
            SRAM PCIE                     : N/A
            SRAM Other                    : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 64 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 40 C
        GPU T.Limit Temp                  : 46 C
        GPU Shutdown T.Limit Temp         : -12 C
        GPU Slowdown T.Limit Temp         : -2 C
        GPU Max Operating T.Limit Temp    : 0 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating T.Limit Temp : N/A
    GPU Power Readings
        Power Draw                        : 4.36 W
        Current Power Limit               : 60.00 W
        Requested Power Limit             : 60.00 W
        Default Power Limit               : 60.00 W
        Min Power Limit                   : 5.00 W
        Max Power Limit                   : 80.00 W
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 765 MHz
    Applications Clocks
        Graphics                          : 2115 MHz
        Memory                            : 8001 MHz
    Default Applications Clocks
        Graphics                          : 2115 MHz
        Memory                            : 8001 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 3105 MHz
        SM                                : 3105 MHz
        Memory                            : 8001 MHz
        Video                             : 2415 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 640.000 mV
    Fabric
        State                             : N/A
        Status                            : N/A
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2404
            Type                          : G
            Name                          : /usr/lib/xorg/Xorg
            Used GPU Memory               : 105 MiB
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                  Version                     Architecture Description
+++-=====================================-===========================-============-=========================================================
un  libgldispatch0-nvidia                 <none>                      <none>       (no description available)
ii  libnvidia-cfg1-535:amd64              535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                    <none>                      <none>       (no description available)
un  libnvidia-common                      <none>                      <none>       (no description available)
ii  libnvidia-common-535                  535.171.04-0ubuntu0.22.04.1 all          Shared files used by the NVIDIA libraries
un  libnvidia-compute                     <none>                      <none>       (no description available)
un  libnvidia-compute-495                 <none>                      <none>       (no description available)
un  libnvidia-compute-495-server          <none>                      <none>       (no description available)
ii  libnvidia-compute-535:amd64           535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA libcompute package
ii  libnvidia-compute-535:i386            535.171.04-0ubuntu0.22.04.1 i386         NVIDIA libcompute package
ii  libnvidia-container-tools             1.15.0-1                    amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64            1.15.0-1                    amd64        NVIDIA container runtime library
un  libnvidia-decode                      <none>                      <none>       (no description available)
ii  libnvidia-decode-535:amd64            535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-535:i386             535.171.04-0ubuntu0.22.04.1 i386         NVIDIA Video Decoding runtime libraries
un  libnvidia-encode                      <none>                      <none>       (no description available)
ii  libnvidia-encode-535:amd64            535.171.04-0ubuntu0.22.04.1 amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-535:i386             535.171.04-0ubuntu0.22.04.1 i386         NVENC Video Encoding runtime library
un  libnvidia-extra                       <none>                      <none>       (no description available)
ii  libnvidia-extra-535:amd64             535.171.04-0ubuntu0.22.04.1 amd64        Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                        <none>                      <none>       (no description available)
ii  libnvidia-fbc1-535:amd64              535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-535:i386               535.171.04-0ubuntu0.22.04.1 i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                          <none>                      <none>       (no description available)
ii  libnvidia-gl-535:amd64                535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-535:i386                 535.171.04-0ubuntu0.22.04.1 i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-ml-dev:amd64                11.5.50~11.5.1-1ubuntu1     amd64        NVIDIA Management Library (NVML) development files
un  libnvidia-ml.so.1                     <none>                      <none>       (no description available)
un  nvidia-384                            <none>                      <none>       (no description available)
un  nvidia-390                            <none>                      <none>       (no description available)
un  nvidia-common                         <none>                      <none>       (no description available)
un  nvidia-compute-utils                  <none>                      <none>       (no description available)
ii  nvidia-compute-utils-535              535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA compute utilities
un  nvidia-container-runtime              <none>                      <none>       (no description available)
un  nvidia-container-runtime-hook         <none>                      <none>       (no description available)
ii  nvidia-container-toolkit              1.15.0-1                    amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base         1.15.0-1                    amd64        NVIDIA Container Toolkit Base
ii  nvidia-cuda-dev:amd64                 11.5.1-1ubuntu1             amd64        NVIDIA CUDA development files
un  nvidia-cuda-doc                       <none>                      <none>       (no description available)
ii  nvidia-cuda-gdb                       11.5.114~11.5.1-1ubuntu1    amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                   11.5.1-1ubuntu1             amd64        NVIDIA CUDA development toolkit
ii  nvidia-cuda-toolkit-doc               11.5.1-1ubuntu1             all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-dkms-535                       535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA DKMS package
un  nvidia-dkms-kernel                    <none>                      <none>       (no description available)
ii  nvidia-driver-535                     535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA driver metapackage
un  nvidia-driver-binary                  <none>                      <none>       (no description available)
ii  nvidia-firmware-535-535.171.04        535.171.04-0ubuntu0.22.04.1 amd64        Firmware files used by the kernel module
un  nvidia-firmware-535-server-535.171.04 <none>                      <none>       (no description available)
un  nvidia-kernel-common                  <none>                      <none>       (no description available)
ii  nvidia-kernel-common-535              535.171.04-0ubuntu0.22.04.1 amd64        Shared files used with the kernel module
un  nvidia-kernel-source                  <none>                      <none>       (no description available)
ii  nvidia-kernel-source-535              535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA kernel source package
un  nvidia-libopencl1                     <none>                      <none>       (no description available)
un  nvidia-libopencl1-dev                 <none>                      <none>       (no description available)
ii  nvidia-opencl-dev:amd64               11.5.1-1ubuntu1             amd64        NVIDIA OpenCL development files
un  nvidia-opencl-icd                     <none>                      <none>       (no description available)
un  nvidia-persistenced                   <none>                      <none>       (no description available)
ii  nvidia-prime                          0.8.17.1                    all          Tools to enable NVIDIA's Prime
ii  nvidia-profiler                       11.5.114~11.5.1-1ubuntu1    amd64        NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings                       510.47.03-0ubuntu1          amd64        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                <none>                      <none>       (no description available)
un  nvidia-smi                            <none>                      <none>       (no description available)
un  nvidia-utils                          <none>                      <none>       (no description available)
ii  nvidia-utils-535                      535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA driver support binaries
ii  nvidia-visual-profiler                11.5.114~11.5.1-1ubuntu1    amd64        NVIDIA Visual Profiler for CUDA and OpenCL
ii  xserver-xorg-video-nvidia-535         535.171.04-0ubuntu0.22.04.1 amd64        NVIDIA binary Xorg driver

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions