-
Notifications
You must be signed in to change notification settings - Fork 435
Description
I'm experimenting with nvidia-container-toolkit on FedoraCoreOS and specifically using podman to run gpu workloads.
The new v1.14 nvidia-container-toolkit and nvidia-container-toolkit-base install fine with rpm-ostree now which is great.
But sudo nvidia-ctk --debug cdi generate to generate the CDI spec fails due to an inability to locate the NVML shared library.
Inspecting /etc/ld.so.conf.d/libnvidia-container-tools-1.4.0-1.x86_64.conf I see it indicates libraries in /usr/local/lib.
The new generic rpms do not install there for FedoraCoreOS (because they can't).
I am running the driver container with the recommended shared mounts so manually editing this file to ref its location, i.e. replacing with/run/nvidia/driver/usr/lib64 followed by an ldconfig to load shared libraries anew, I'm able to complete CDI spec generation.
It would seem that the generic rpm installation should somehow anticipate the location of the libraries or perhaps list several potential folder locations including this one that is used for the driver container.