-
Notifications
You must be signed in to change notification settings - Fork 412
Open
Description
I installed the nvidia-gpu-operator v25.10 on my K3S cluster. Most gpu operator related pods are started successfully, except for the cuda validator, which fails with the following message:
cuda-validation Failed to allocate device vector A (error code no CUDA-capable device is detected)!
cuda-validation [Vector addition of 50000 elements]
stream closed EOF for gpu-operator/nvidia-cuda-validator-r6nsb (cuda-validation)
I downgraded to v25.3.2 and everything worked.
My host system is Gentoo. I installed the nvidia driver and nvidia-container-toolkit directly using the host package manager.
I customised the operator with the following values:
driver:
enabled: false
toolkit:
enabled: false
devicePlugin:
config:
name: device-plugin-config
create: true
default: "time-slicing"
data:
time-slicing: |-
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
renameByDefault: false
failRequestsGreaterThanOne: true
resources:
- name: nvidia.com/gpu
replicas: 4Metadata
Metadata
Assignees
Labels
No labels