-
-
Notifications
You must be signed in to change notification settings - Fork 244
fix: install nvidia-container-toolkit in F43 #3561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Due to he switch to RPM 6, installing the nvidia-container-toolkit fails in F43 and was subsequently removed from the ublue-os/akmods repo. We can add it here with a little hacking around the verification. This is not an ideal solution, and must be reverted when Nvidia starts building with a modern RPM version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a workaround for installing nvidia-container-toolkit on Fedora 43 by temporarily disabling RPM package verification. The change addresses a known upstream issue where the RPM fails verification checks.
- Adds Fedora 43-specific installation logic for nvidia-container-toolkit with disabled RPM verification
- Includes reference to upstream issue and intent to revert when resolved
- Uses temporary RPM macro configuration to bypass signature verification
|
Do we still need this? |
|
Yes - the nvidia-container-toolkit is needed as the CDI subsystem is required for container integration. I cannot comment on the PR. Just the need for the RPM. I chased down my errors with ramalama (and other CUDA container work with pytorch, for example) to this after upgrade to nvidia driver 580.105.x. The fix is to run nvidia-ctk to re-generate the cdi config. See ramalama-cuda - CUDA Updates.
sudo dnf info golang-github-nvidia-container-toolkit
Updating and loading repositories:
Repositories loaded.
Available packages
Name : golang-github-nvidia-container-toolkit
Epoch : 0
Version : 1.17.4
Release : 1.fc42
Architecture : x86_64
Download size : 3.2 MiB
Installed size : 14.8 MiB
Source : golang-github-nvidia-container-toolkit-1.17.4-1.fc42.src.rpm
Repository : updates
Summary : Build and run containers leveraging NVIDIA GPUs
URL : https://github.com/NVIDIA/nvidia-container-toolkit
License : Apache-2.0
Description : The NVIDIA Container Toolkit allows users to build and run NVIDIA GPU
: accelerated containers. The toolkit includes a container runtime library and
: utilities to automatically configure containers to leverage NVIDIA GPUs.
Vendor : Fedora ProjectI also discovered a workaround - a huge hack - that helped me further troubleshoot the touch points. WARNING TO NON-MAINTAINER READERS DO NOT DO THIS. I am only sharing this in hopes it may give some insight as to what is needed at a bare minimum. What they are attempting in this PR is the right approach - once J0rge's question is answered about the dnf6 workaround. Expand to see my hack of a workaroundThe contents of the rpm pkg consists of /usr/bin/nvidia-ctk, /usr/bin/nvidia-cdi-hook, some files in /usr/share and some steps to install systemd units. I noticed while doing the
Then, back on the host - still running 42.20251111:
Success! I was able to interact with llama3.2:3b in the browser. Next, I powered off and booted 43.20251119.1. I repeated
Again success. Conclusion, since the time elapsed after the upgrade is small enough I was able to put in place a work around with just the 2 binaries and the fixed up nvidia.yaml file. The rest of the rpm install is missing though. See the nvidia-container-toolkit.spec for details. FYI - both binaries have the same short list of external dependencies: $ ldd ~/etc/cdi/nvidia-ctk
linux-vdso.so.1 (0x00007f8e6e331000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f8e6e30f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8e6e30b000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f8e6e2f9000)
libc.so.6 => /lib64/libc.so.6 (0x00007f8e6e105000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8e6e333000)FYI |
Due to he switch to RPM 6, installing the nvidia-container-toolkit
fails in F43 and was subsequently removed from the
ublue-os/akmods repo.
We can add it here with a little hacking around the verification.
This is not an ideal solution, and must be reverted when Nvidia
starts building with a modern RPM version.
See NVIDIA/nvidia-container-toolkit#1307