-
Notifications
You must be signed in to change notification settings - Fork 435
Closed
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
Hi there.
1. Issue or feature description
Getting the following error when I Running a Sample Workload
# command:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
# error output:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: failed to process request: unknown.2. Steps to reproduce the issue
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smiThen the error reproduced:
# error output:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: failed to process request: unknown.3. Information to attach (optional if deemed irrelevant)
- Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0621 02:29:23.330189 9108 nvc.c:393] initializing library context (version=1.15.0, build=6c8f1df7fd32cea3280cf2a2c6e931c9b3132465)
I0621 02:29:23.330230 9108 nvc.c:364] using root /
I0621 02:29:23.330234 9108 nvc.c:365] using ldcache /etc/ld.so.cache
I0621 02:29:23.330239 9108 nvc.c:366] using unprivileged user 1000:1000
I0621 02:29:23.330266 9108 nvc.c:410] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0621 02:29:23.330403 9108 nvc.c:412] dxcore initialization failed, continuing assuming a non-WSL environment
W0621 02:29:23.359870 9109 nvc.c:273] failed to set inheritable capabilities
W0621 02:29:23.359945 9109 nvc.c:274] skipping kernel modules load due to failure
I0621 02:29:23.360866 9110 rpc.c:71] starting driver rpc service
I0621 02:29:23.362883 9108 rpc.c:135] driver rpc service terminated successfully
nvidia-container-cli: initialization error: driver rpc error: failed to process request
I0621 02:29:23.362972 9108 nvc.c:452] shutting down library context- Kernel version from
uname -a
Linux SHJS-PF4WFQ33 6.5.0-1024-oem #25-Ubuntu SMP PREEMPT_DYNAMIC Mon May 20 14:47:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux- Docker version from
docker version
Client: Docker Engine - Community
Version: 20.10.13
API version: 1.41
Go version: go1.16.15
Git commit: a224086
Built: Thu Mar 10 14:07:47 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.13
API version: 1.41 (minimum version 1.12)
Go version: go1.16.15
Git commit: 906f57f
Built: Thu Mar 10 14:05:38 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.33
GitCommit: d2d58213f83a351ca8f528a95fbd145f5654e957
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0- NVIDIA container library version from
nvidia-container-cli -V
cli-version: 1.15.0
lib-version: 1.15.0
build date: 2024-04-15T13:36+00:00
build revision: 6c8f1df7fd32cea3280cf2a2c6e931c9b3132465
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections-
Any relevant kernel output lines from
dmesg
dmesg.log -
Driver information from
nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Fri Jun 21 10:56:30 2024
Driver Version : 535.171.04
CUDA Version : 12.2
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA RTX 2000 Ada Generation Laptop GPU
Product Brand : NVIDIA RTX
Product Architecture : Ada Lovelace
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Addressing Mode : None
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-4f80a2a3-a0a3-cf61-ef3f-5ca4cb3b6888
Minor Number : 0
VBIOS Version : 95.07.28.00.68
MultiGPU Board : No
Board ID : 0x100
Board Part Number : N/A
GPU Part Number : 28B8-975-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G002.0000.00.03
OEM Object : 2.0
ECC Object : N/A
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : N/A
Latest Duration : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x28B810DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x230617AA
GPU Link Info
PCIe Generation
Max : 4
Current : 1
Device Current : 1
Device Max : 4
Host Max : 5
Link Width
Max : 8x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 85000 KB/s
Rx Throughput : 786000 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P8
Clocks Event Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
Sparse Operation Mode : N/A
FB Memory Usage
Total : 8188 MiB
Reserved : 247 MiB
Used : 109 MiB
Free : 7831 MiB
BAR1 Memory Usage
Total : 8192 MiB
Used : 3 MiB
Free : 8189 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 21 %
Memory : 6 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable Parity : N/A
SRAM Uncorrectable SEC-DED : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable Parity : N/A
SRAM Uncorrectable SEC-DED : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
SRAM Threshold Exceeded : N/A
Aggregate Uncorrectable SRAM Sources
SRAM L2 : N/A
SRAM SM : N/A
SRAM Microcontroller : N/A
SRAM PCIE : N/A
SRAM Other : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 64 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 40 C
GPU T.Limit Temp : 46 C
GPU Shutdown T.Limit Temp : -12 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating T.Limit Temp : N/A
GPU Power Readings
Power Draw : 4.36 W
Current Power Limit : 60.00 W
Requested Power Limit : 60.00 W
Default Power Limit : 60.00 W
Min Power Limit : 5.00 W
Max Power Limit : 80.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 210 MHz
SM : 210 MHz
Memory : 405 MHz
Video : 765 MHz
Applications Clocks
Graphics : 2115 MHz
Memory : 8001 MHz
Default Applications Clocks
Graphics : 2115 MHz
Memory : 8001 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 3105 MHz
SM : 3105 MHz
Memory : 8001 MHz
Video : 2415 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 640.000 mV
Fabric
State : N/A
Status : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2404
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 105 MiB- NVIDIA packages version from
dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=====================================-===========================-============-=========================================================
un libgldispatch0-nvidia <none> <none> (no description available)
ii libnvidia-cfg1-535:amd64 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any <none> <none> (no description available)
un libnvidia-common <none> <none> (no description available)
ii libnvidia-common-535 535.171.04-0ubuntu0.22.04.1 all Shared files used by the NVIDIA libraries
un libnvidia-compute <none> <none> (no description available)
un libnvidia-compute-495 <none> <none> (no description available)
un libnvidia-compute-495-server <none> <none> (no description available)
ii libnvidia-compute-535:amd64 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA libcompute package
ii libnvidia-compute-535:i386 535.171.04-0ubuntu0.22.04.1 i386 NVIDIA libcompute package
ii libnvidia-container-tools 1.15.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.15.0-1 amd64 NVIDIA container runtime library
un libnvidia-decode <none> <none> (no description available)
ii libnvidia-decode-535:amd64 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA Video Decoding runtime libraries
ii libnvidia-decode-535:i386 535.171.04-0ubuntu0.22.04.1 i386 NVIDIA Video Decoding runtime libraries
un libnvidia-encode <none> <none> (no description available)
ii libnvidia-encode-535:amd64 535.171.04-0ubuntu0.22.04.1 amd64 NVENC Video Encoding runtime library
ii libnvidia-encode-535:i386 535.171.04-0ubuntu0.22.04.1 i386 NVENC Video Encoding runtime library
un libnvidia-extra <none> <none> (no description available)
ii libnvidia-extra-535:amd64 535.171.04-0ubuntu0.22.04.1 amd64 Extra libraries for the NVIDIA driver
un libnvidia-fbc1 <none> <none> (no description available)
ii libnvidia-fbc1-535:amd64 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-fbc1-535:i386 535.171.04-0ubuntu0.22.04.1 i386 NVIDIA OpenGL-based Framebuffer Capture runtime library
un libnvidia-gl <none> <none> (no description available)
ii libnvidia-gl-535:amd64 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-gl-535:i386 535.171.04-0ubuntu0.22.04.1 i386 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-ml-dev:amd64 11.5.50~11.5.1-1ubuntu1 amd64 NVIDIA Management Library (NVML) development files
un libnvidia-ml.so.1 <none> <none> (no description available)
un nvidia-384 <none> <none> (no description available)
un nvidia-390 <none> <none> (no description available)
un nvidia-common <none> <none> (no description available)
un nvidia-compute-utils <none> <none> (no description available)
ii nvidia-compute-utils-535 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA compute utilities
un nvidia-container-runtime <none> <none> (no description available)
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.15.0-1 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.15.0-1 amd64 NVIDIA Container Toolkit Base
ii nvidia-cuda-dev:amd64 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development files
un nvidia-cuda-doc <none> <none> (no description available)
ii nvidia-cuda-gdb 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.5.1-1ubuntu1 all NVIDIA CUDA and OpenCL documentation
ii nvidia-dkms-535 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA DKMS package
un nvidia-dkms-kernel <none> <none> (no description available)
ii nvidia-driver-535 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA driver metapackage
un nvidia-driver-binary <none> <none> (no description available)
ii nvidia-firmware-535-535.171.04 535.171.04-0ubuntu0.22.04.1 amd64 Firmware files used by the kernel module
un nvidia-firmware-535-server-535.171.04 <none> <none> (no description available)
un nvidia-kernel-common <none> <none> (no description available)
ii nvidia-kernel-common-535 535.171.04-0ubuntu0.22.04.1 amd64 Shared files used with the kernel module
un nvidia-kernel-source <none> <none> (no description available)
ii nvidia-kernel-source-535 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA kernel source package
un nvidia-libopencl1 <none> <none> (no description available)
un nvidia-libopencl1-dev <none> <none> (no description available)
ii nvidia-opencl-dev:amd64 11.5.1-1ubuntu1 amd64 NVIDIA OpenCL development files
un nvidia-opencl-icd <none> <none> (no description available)
un nvidia-persistenced <none> <none> (no description available)
ii nvidia-prime 0.8.17.1 all Tools to enable NVIDIA's Prime
ii nvidia-profiler 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-settings 510.47.03-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
un nvidia-settings-binary <none> <none> (no description available)
un nvidia-smi <none> <none> (no description available)
un nvidia-utils <none> <none> (no description available)
ii nvidia-utils-535 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA driver support binaries
ii nvidia-visual-profiler 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
ii xserver-xorg-video-nvidia-535 535.171.04-0ubuntu0.22.04.1 amd64 NVIDIA binary Xorg driverMetadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.