docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed.

### docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed

**Machine Specs & Environment Setup**
OS				: Ubuntu 24.04 LTS
NVIDIA Driver	        : 550.90.07
CUDA Version	        : 12.4
GPU Model		: A6000
Docker Version	: 24.0.7, build afdd53b

**Steps of Setting up the Nvidia Container Toolkit**
I followed the steps mentioned in this documentation: [Installing the Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#prerequisites)

**Installing with Apt**

`curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list`

`sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list`

`sudo apt-get update`

`sudo apt-get install -y nvidia-container-toolkit`

**Configuring Docker**

`sudo nvidia-ctk runtime configure --runtime=docker`

`sudo systemctl restart docker`

**Issue 2:** 
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown.

*Caused by:* `
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi`

*Notes:* 
Tried to run the above command after starting the nvidia persistenced mode by `sudo nvidia-smi -pm ENABLED` command, and that didn't resolved it, also tried `sudo nvidia-persistenced` to start it as a daemon srevice

Then finally tried unload and reload of the NVIDIA drivers:

```
sudo systemctl stop gdm

sudo rmmod nvidia_uvm
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia

sudo modprobe nvidia
sudo modprobe nvidia_modeset
sudo modprobe nvidia_drm
sudo modprobe nvidia_uvm

sudo systemctl start gdm

sudo systemctl restart docker
```
After reloading tried executing the same docker command and now i'm getting a different error mentioned below.

**Issue 2:** 
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: failed to inject devices: failed to stat CDI host device "/dev/dri/card1": no such file or directory: unknown.

*Caused by:*
`sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi`

*Notes:*
This issue originated after unloading and reloading of the NVIDIA drivers in the attempt to fix the **Issue 1** (which is still not resolved).

**Special Notes**
*nvidia-persistenced* not starting when attempted to start as in the following command `sudo nvidia-persistenced` throwing an error message:

`nvidia-persistenced failed to initialize. Check syslog for more details.`

After investigating the syslogs found these logs:
```
sudo cat /var/log/syslog | grep nvidia-persistenced
2024-09-03T20:57:06.322985+00:00 toro docker.nvidia-container-toolkit[1787337]: time="2024-09-03T20:57:06Z" level=info msg="Selecting /run/nvidia-persistenced/socket as /run/nvidia-persistenced/socket"
2024-09-03T20:57:06.324024+00:00 toro docker.nvidia-container-toolkit[1787337]: time="2024-09-03T20:57:06Z" level=info msg="Selecting /var/lib/snapd/hostfs/usr/bin/nvidia-persistenced as /var/lib/snapd/hostfs/usr/bin/nvidia-persistenced"
2024-09-03T20:57:06.657932+00:00 toro docker.nvidia-container-toolkit[1787496]: time="2024-09-03T20:57:06Z" level=info msg="Selecting /run/nvidia-persistenced/socket as /run/nvidia-persistenced/socket"
2024-09-03T20:57:06.658053+00:00 toro docker.nvidia-container-toolkit[1787496]: time="2024-09-03T20:57:06Z" level=info msg="Selecting /var/lib/snapd/hostfs/usr/bin/nvidia-persistenced as /var/lib/snapd/hostfs/usr/bin/nvidia-persistenced"
2024-09-06T11:39:16.670934+00:00 toro nvidia-persistenced: device 0000:01:00.0 - persistence mode disabled.
2024-09-06T11:39:16.671113+00:00 toro nvidia-persistenced: device 0000:01:00.0 - NUMA memory offlined.
2024-09-06T11:39:24.050313+00:00 toro nvidia-persistenced: device 0000:01:00.0 - persistence mode enabled.
2024-09-06T11:39:24.050493+00:00 toro nvidia-persistenced: device 0000:01:00.0 - NUMA memory onlined.
2024-09-06T11:59:14.012734+00:00 toro nvidia-persistenced: Failed to lock PID file: Resource temporarily unavailable
2024-09-06T11:59:14.012804+00:00 toro nvidia-persistenced: Shutdown (1902356)
2024-09-06T11:59:19.584467+00:00 toro nvidia-persistenced: Failed to lock PID file: Resource temporarily unavailable
2024-09-06T11:59:19.584576+00:00 toro nvidia-persistenced: Shutdown (1902361)
2024-09-06T11:59:59.116947+00:00 toro nvidia-persistenced: Failed to lock PID file: Resource temporarily unavailable
2024-09-06T11:59:59.117104+00:00 toro nvidia-persistenced: Shutdown (1902386)
2024-09-06T12:00:43.320943+00:00 toro nvidia-persistenced: device 0000:01:00.0 - persistence mode disabled.
2024-09-06T12:00:43.321232+00:00 toro nvidia-persistenced: device 0000:01:00.0 - NUMA memory offlined.
2024-09-06T12:00:46.881181+00:00 toro nvidia-persistenced: Failed to lock PID file: Resource temporarily unavailable
2024-09-06T12:00:46.881291+00:00 toro nvidia-persistenced: Shutdown (1902413)
2024-09-06T12:01:13.921359+00:00 toro nvidia-persistenced: Failed to lock PID file: Resource temporarily unavailable
2024-09-06T12:01:13.921592+00:00 toro nvidia-persistenced: Shutdown (1902426)
```

I'm not sure if this is the root cause for the issues I'm experiencing, hope these information will helps to troubleshoot and provide a solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed. #679

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed. #679

Description

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions