Upgrade to 25.10.0 - Pod with GPU will not start


**Describe the bug**

After upgrading to 25.10.0 from 25.3.2, I get error:
CrashLoopBackOff (back-off 40s restarting failed container=hello-kubernetes pod=hello-kubernetes-69575f56b-9dzz4_test-ns(5e20c659-44d1-4c22-9d8f-560e3411fc58)) | Last state: Terminated with 128: StartError (failed to create containerd task: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices management.nvidia.com/gpu=GPU-feefc289-ca5f-9917-2bf1-9477651da944), started: Thu, Jan 1 1970 1:00:00 am, finished: Sun, Nov 9 2025 12:34:53 am

Probably because CDI is not default and if I set `runtimeClassName: nvidia-cdi` it fails in the above way.

If I omit `runtimeClassName`, it starts and I can see nvidia devices in `/dev`, but there are no injected userspace files like `nvidia-smi` and libraries.

Do I need to change something in the operator configuration?

I use for toolkit:
```
 - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
   value: "true"
 - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
   value: "false"
```

and for the `devicePlugin`:
```
env:
    - name: PASS_DEVICE_SPECS
      value: "true"
    - name: FAIL_ON_INIT_ERROR
      value: "true"
    - name: DEVICE_LIST_STRATEGY
      value: volume-mounts
    - name: DEVICE_ID_STRATEGY
      value: uuid
    - name: NVIDIA_VISIBLE_DEVICES
      value: all
    - name: NVIDIA_DRIVER_CAPABILITIES
      value: all
```

**Environment (please provide the following information):**
 - GPU Operator Version: v25.10.0
 - OS: Ubuntu 24.04
 - Kernel Version: 6.8.0-generic
 - Container Runtime Version: containerd 2.1.4
 - Kubernetes Distro and Version: RKE 1.33.5


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrade to 25.10.0 - Pod with GPU will not start #1876

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Upgrade to 25.10.0 - Pod with GPU will not start #1876

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions