Skip to content

failed to initalize NVML: ERROR_LIBRARY_NOT_FOUND #118

@fifofonix

Description

@fifofonix

I'm experimenting with nvidia-container-toolkit on FedoraCoreOS and specifically using podman to run gpu workloads.

The new v1.14 nvidia-container-toolkit and nvidia-container-toolkit-base install fine with rpm-ostree now which is great.

But sudo nvidia-ctk --debug cdi generate to generate the CDI spec fails due to an inability to locate the NVML shared library.

Inspecting /etc/ld.so.conf.d/libnvidia-container-tools-1.4.0-1.x86_64.conf I see it indicates libraries in /usr/local/lib.

The new generic rpms do not install there for FedoraCoreOS (because they can't).

I am running the driver container with the recommended shared mounts so manually editing this file to ref its location, i.e. replacing with/run/nvidia/driver/usr/lib64 followed by an ldconfig to load shared libraries anew, I'm able to complete CDI spec generation.

It would seem that the generic rpm installation should somehow anticipate the location of the libraries or perhaps list several potential folder locations including this one that is used for the driver container.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions