Skip to content

rootless podman sees all GPUs despite cgroups setup #585

@arogozhnikov

Description

@arogozhnikov

There is a discussion related to docker #211
But for docker it is expected that root-daemon has access to all gpus.

In my case, I run podman within SLURM, which uses cgroups to control acccess to devices.
CPU virtualization works correcctly, but not GPU virtualization, e.g. compare:

srun --label --nodes=2 --ntasks-per-node=1 --gpus-per-task=2 bash -c \
  'nvidia-smi --query-gpu=name,utilization.gpu,memory.used --format=csv'
0: name, utilization.gpu [%], memory.used [MiB]
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
1: name, utilization.gpu [%], memory.used [MiB]
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB

vs

srun --label --nodes=2 --ntasks-per-node=1 --gpus-per-task=2 podman run --rm --device nvidia.com/gpu=all docker.io/nvidia/cuda:12.2.2-base-ubuntu22.04 bash -c \
  'nvidia-smi --query-gpu=name,utilization.gpu,memory.used --format=csv'
1: name, utilization.gpu [%], memory.used [MiB]
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
1: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
0: name, utilization.gpu [%], memory.used [MiB]
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB
0: NVIDIA A100-SXM4-80GB, 0 %, 0 MiB

Metadata

Metadata

Assignees

Labels

lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions