Skip to content

Conversation

@p5
Copy link
Member

@p5 p5 commented Nov 1, 2025

Due to he switch to RPM 6, installing the nvidia-container-toolkit
fails in F43 and was subsequently removed from the
ublue-os/akmods repo.
We can add it here with a little hacking around the verification.
This is not an ideal solution, and must be reverted when Nvidia
starts building with a modern RPM version.

See NVIDIA/nvidia-container-toolkit#1307

p5 added 2 commits November 1, 2025 14:06
Due to he switch to RPM 6, installing the nvidia-container-toolkit 
fails in F43 and was subsequently removed from the 
ublue-os/akmods repo.  
We can add it here with a little hacking around the verification.  
This is not an ideal solution, and must be reverted when Nvidia
starts building with a modern RPM version.
@p5 p5 linked an issue Nov 1, 2025 that may be closed by this pull request
@p5 p5 marked this pull request as ready for review November 1, 2025 15:19
Copilot AI review requested due to automatic review settings November 1, 2025 15:19
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Nov 1, 2025
@p5 p5 marked this pull request as draft November 1, 2025 15:19
@dosubot
Copy link

dosubot bot commented Nov 1, 2025

Related Documentation

Checked 14 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@dosubot dosubot bot added the bug Something isn't working label Nov 1, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a workaround for installing nvidia-container-toolkit on Fedora 43 by temporarily disabling RPM package verification. The change addresses a known upstream issue where the RPM fails verification checks.

  • Adds Fedora 43-specific installation logic for nvidia-container-toolkit with disabled RPM verification
  • Includes reference to upstream issue and intent to revert when resolved
  • Uses temporary RPM macro configuration to bypass signature verification

@castrojo
Copy link
Collaborator

Do we still need this?

@klmcwhirter
Copy link

klmcwhirter commented Nov 19, 2025

Yes - the nvidia-container-toolkit is needed as the CDI subsystem is required for container integration.

I cannot comment on the PR. Just the need for the RPM.

I chased down my errors with ramalama (and other CUDA container work with pytorch, for example) to this after upgrade to nvidia driver 580.105.x.

The fix is to run nvidia-ctk to re-generate the cdi config.

See ramalama-cuda - CUDA Updates.

NOTE that the golang pkg does seem to exist in F42 repo now. Confirmed in a running fedora-toolbox distrobox:

sudo dnf info golang-github-nvidia-container-toolkit
Updating and loading repositories:
Repositories loaded.
Available packages
Name           : golang-github-nvidia-container-toolkit
Epoch          : 0
Version        : 1.17.4
Release        : 1.fc42
Architecture   : x86_64
Download size  : 3.2 MiB
Installed size : 14.8 MiB
Source         : golang-github-nvidia-container-toolkit-1.17.4-1.fc42.src.rpm
Repository     : updates
Summary        : Build and run containers leveraging NVIDIA GPUs
URL            : https://github.com/NVIDIA/nvidia-container-toolkit
License        : Apache-2.0
Description    : The NVIDIA Container Toolkit allows users to build and run NVIDIA GPU
               : accelerated containers. The toolkit includes a container runtime library and
               : utilities to automatically configure containers to leverage NVIDIA GPUs.
Vendor         : Fedora Project

I also discovered a workaround - a huge hack - that helped me further troubleshoot the touch points.

WARNING TO NON-MAINTAINER READERS DO NOT DO THIS. I am only sharing this in hopes it may give some insight as to what is needed at a bare minimum. What they are attempting in this PR is the right approach - once J0rge's question is answered about the dnf6 workaround.

Expand to see my hack of a workaround

The contents of the rpm pkg consists of /usr/bin/nvidia-ctk, /usr/bin/nvidia-cdi-hook, some files in /usr/share and some steps to install systemd units.

I noticed while doing the dnf info above that from inside the distrobox I could execute nvidia-ctk successfully. So I did the following from within the distrobox:

Note I was booted into ostree:1 which is blufin-dx-nvidia-open:stable at 42.20251111

  • created ~/etc/cdi as a staging area
  • copied the 2 binaries from /usr/sbin to ~/etc/cdi
  • ran nvidia-ctk cdi generate --output=~/etc/cdi/nvidia.yaml
  • exited the distrobox

Then, back on the host - still running 42.20251111:

  • modified ~/etc/cdi/nvidia.yaml by changing the path for only the 2 binaries from /usr/sbin to /var/home/klmcw/etc/cdi
  • sudo cp ~/etc/cdi/nvidia.yaml /etc/cdi/nvidia.yaml
  • ./nvidia-ctk cdi list # saw correct output
  • ramalama serve -c 4096 -n llama32-3b llama3.2:3b

Success! I was able to interact with llama3.2:3b in the browser.

Next, I powered off and booted 43.20251119.1.

I repeated sudo cp ~/etc/cdi/nvidia.yaml /etc/cdi.

~/etc/cdi/nvidia-ctk cdi list again looked good.

ramalama serve -c 4096 -n llama32-3b llama3.2:3b

Again success.

Conclusion, since the time elapsed after the upgrade is small enough I was able to put in place a work around with just the 2 binaries and the fixed up nvidia.yaml file.

The rest of the rpm install is missing though. See the nvidia-container-toolkit.spec for details.

FYI - both binaries have the same short list of external dependencies:

$ ldd ~/etc/cdi/nvidia-ctk 
	linux-vdso.so.1 (0x00007f8e6e331000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f8e6e30f000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8e6e30b000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f8e6e2f9000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f8e6e105000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f8e6e333000)

FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nvidia-container-toolkit Missing from Bluefin DX NVIDIA Image

4 participants