nvidia-container-toolkit Missing from Bluefin DX NVIDIA Image


### Discussed in https://github.com/ublue-os/bluefin/discussions/3559

<div type='discussions-op-text'>

<sup>Originally posted by **MCancian** November  1, 2025</sup>
Sorry for the AI-assisted bug report, but I'm a political scientist primarily! I'm using GPU passthrough in a container to do my OCR work, but new processes that I start then can't use the GPUs. Here is the Claude Code bug report:

## Description

The `nvidia-container-toolkit` is not available on Bluefin DX NVIDIA variant, preventing proper GPU sharing between the host and Podman containers. This causes GPU access issues where host services cannot use the GPUs after containers with GPU passthrough are started, despite sufficient VRAM being available.

## System Information

- **Distribution**: Bluefin DX (Developer eXperience)
- **Variant**: `bluefin-dx-nvidia-open`
- **Version**: gts-41.20251019 (Silverblue)
- **Fedora Base**: Fedora 41
- **Kernel**: 6.16.8-100.fc41.x86_64
- **Podman Version**: 5.6.2
- **Build ID**: 282a7f2

## GPU Configuration

- **GPU 0**: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (97887 MiB VRAM)
- **GPU 1**: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (97887 MiB VRAM)
- **Driver Version**: 580.95.05 (Open kernel modules)
- **Compute Mode**: Default (shared) on both GPUs

## Problem

When using Podman containers with GPU passthrough via direct device mapping (`--device=/dev/nvidia0`, etc.), the GPUs become unavailable to host services even though:
1. Both GPUs are in shared compute mode
2. Significant VRAM remains available
3. The container is only using resources from one GPU

This is a known limitation of direct device passthrough. The standard solution is to use `nvidia-container-toolkit` with CDI (Container Device Interface), which properly manages GPU contexts and allows sharing between host and containers.

## Expected Behavior

The `nvidia-container-toolkit` package should be included in Bluefin DX NVIDIA images to enable proper GPU resource management with containers.

## Current State

### nvidia-container-toolkit not found:

```bash
$ which nvidia-ctk nvidia-container-toolkit nvidia-container-runtime
/usr/bin/which: no nvidia-ctk in (/path/to/bin...)
/usr/bin/which: no nvidia-container-toolkit in (/path/to/bin...)
/usr/bin/which: no nvidia-container-runtime in (/path/to/bin...)
```

### RPM packages search shows no runtime package installed:

```bash
$ rpm -qa | grep -i nvidia-container
(no output)
```

### DNF search only shows golang devel package:

```bash
$ dnf search nvidia-container-toolkit
Matched fields: name
 golang-github-nvidia-container-toolkit-devel.noarch	Build and run containers leveraging NVIDIA GPUs
```

The actual runtime package (`nvidia-container-toolkit`) is not available in the enabled repositories.

## Workaround Currently Using

Currently forced to use direct device passthrough in devcontainer configuration:

```json
"runArgs": [
  "--device=/dev/nvidia0",
  "--device=/dev/nvidiactl",
  "--device=/dev/nvidia-uvm",
  "--security-opt", "label=disable",
  "--runtime=runc"
]
```

This works but prevents host GPU access while containers are running.

## Proposed Solution

Include `nvidia-container-toolkit` in the Bluefin DX NVIDIA image, either:
1. Pre-installed in the base image, or
2. Available via `rpm-ostree install nvidia-container-toolkit`

This would enable proper CDI-based GPU sharing:

```json
"runArgs": [
  "--device=nvidia.com/gpu=all",
  "--security-opt", "label=disable"
]
```

## Impact

This issue affects:
- Developers running ML/AI workloads in containers while needing host GPU access
- Users with multiple GPUs who want efficient resource utilization
- DevContainer users following NVIDIA's recommended Podman GPU practices

## Additional Context

- NVIDIA devices are present and working: `/dev/nvidia0`, `/dev/nvidia1`, `/dev/nvidiactl`, `/dev/nvidia-uvm`
- Driver installation is correct (via ublue-os-nvidia-addons)
- This is specifically about container runtime integration, not driver issues

## References

- [NVIDIA Container Toolkit Documentation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
- [Podman GPU Support with CDI](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html)


</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

nvidia-container-toolkit Missing from Bluefin DX NVIDIA Image #3560

Discussed in #3559

Description

System Information

GPU Configuration

Problem

Expected Behavior

Current State

nvidia-container-toolkit not found:

RPM packages search shows no runtime package installed:

DNF search only shows golang devel package:

Workaround Currently Using

Proposed Solution

Impact

Additional Context

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

nvidia-container-toolkit Missing from Bluefin DX NVIDIA Image #3560

Description

Discussed in #3559

Description

System Information

GPU Configuration

Problem

Expected Behavior

Current State

nvidia-container-toolkit not found:

RPM packages search shows no runtime package installed:

DNF search only shows golang devel package:

Workaround Currently Using

Proposed Solution

Impact

Additional Context

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions