-
Notifications
You must be signed in to change notification settings - Fork 412
Description
Hi!
So far I played with time-slicing and I need a way to guarantee that when specifying gpu.shared: 2 (or any other number > 1), a pod gets access to two different physical GPUs, not the same GPU twice. While testing shows this seems to work through some implicit affinity, there's no explicit way to ensure this behavior through deployment manifests.
Additionally, for GPUs with different capacities, I'd like to be able to set different numbers of allowed instances per GPU.
I've tried using the devices field the sharing.timeSlicing config, but logs show the only available value is "all".
Is there a way to explicitly control GPU allocation to ensure a pod gets access to multiple distinct GPUs when needed? Perhaps there's another approach besides time-slicing that might address this use case?
I'm trying to achieve a setup where:
- All GPUs are shared
- Some pods need access specifically to multiple different GPUs (not the same GPU multiple times)
- Other pods need access to just one GPU per pod
For context, I've verified that when using gpu.shared: 2, pods do seem to access different GPUs, but I'd like a more explicit way to guarantee and control this behavior.
Currently I have a gpu-operator deployed via helm of version version=v24.9.1 on the node(s) with 2+ gpu(s), while k8s is v1.30.6+rke2r1