-
-
Notifications
You must be signed in to change notification settings - Fork 244
Description
Discussed in #3559
Originally posted by MCancian November 1, 2025
Sorry for the AI-assisted bug report, but I'm a political scientist primarily! I'm using GPU passthrough in a container to do my OCR work, but new processes that I start then can't use the GPUs. Here is the Claude Code bug report:
Description
The nvidia-container-toolkit is not available on Bluefin DX NVIDIA variant, preventing proper GPU sharing between the host and Podman containers. This causes GPU access issues where host services cannot use the GPUs after containers with GPU passthrough are started, despite sufficient VRAM being available.
System Information
- Distribution: Bluefin DX (Developer eXperience)
- Variant:
bluefin-dx-nvidia-open - Version: gts-41.20251019 (Silverblue)
- Fedora Base: Fedora 41
- Kernel: 6.16.8-100.fc41.x86_64
- Podman Version: 5.6.2
- Build ID: 282a7f2
GPU Configuration
- GPU 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (97887 MiB VRAM)
- GPU 1: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (97887 MiB VRAM)
- Driver Version: 580.95.05 (Open kernel modules)
- Compute Mode: Default (shared) on both GPUs
Problem
When using Podman containers with GPU passthrough via direct device mapping (--device=/dev/nvidia0, etc.), the GPUs become unavailable to host services even though:
- Both GPUs are in shared compute mode
- Significant VRAM remains available
- The container is only using resources from one GPU
This is a known limitation of direct device passthrough. The standard solution is to use nvidia-container-toolkit with CDI (Container Device Interface), which properly manages GPU contexts and allows sharing between host and containers.
Expected Behavior
The nvidia-container-toolkit package should be included in Bluefin DX NVIDIA images to enable proper GPU resource management with containers.
Current State
nvidia-container-toolkit not found:
$ which nvidia-ctk nvidia-container-toolkit nvidia-container-runtime
/usr/bin/which: no nvidia-ctk in (/path/to/bin...)
/usr/bin/which: no nvidia-container-toolkit in (/path/to/bin...)
/usr/bin/which: no nvidia-container-runtime in (/path/to/bin...)RPM packages search shows no runtime package installed:
$ rpm -qa | grep -i nvidia-container
(no output)DNF search only shows golang devel package:
$ dnf search nvidia-container-toolkit
Matched fields: name
golang-github-nvidia-container-toolkit-devel.noarch Build and run containers leveraging NVIDIA GPUsThe actual runtime package (nvidia-container-toolkit) is not available in the enabled repositories.
Workaround Currently Using
Currently forced to use direct device passthrough in devcontainer configuration:
"runArgs": [
"--device=/dev/nvidia0",
"--device=/dev/nvidiactl",
"--device=/dev/nvidia-uvm",
"--security-opt", "label=disable",
"--runtime=runc"
]This works but prevents host GPU access while containers are running.
Proposed Solution
Include nvidia-container-toolkit in the Bluefin DX NVIDIA image, either:
- Pre-installed in the base image, or
- Available via
rpm-ostree install nvidia-container-toolkit
This would enable proper CDI-based GPU sharing:
"runArgs": [
"--device=nvidia.com/gpu=all",
"--security-opt", "label=disable"
]Impact
This issue affects:
- Developers running ML/AI workloads in containers while needing host GPU access
- Users with multiple GPUs who want efficient resource utilization
- DevContainer users following NVIDIA's recommended Podman GPU practices
Additional Context
- NVIDIA devices are present and working:
/dev/nvidia0,/dev/nvidia1,/dev/nvidiactl,/dev/nvidia-uvm - Driver installation is correct (via ublue-os-nvidia-addons)
- This is specifically about container runtime integration, not driver issues