Skip to content

Conversation

@jeefy
Copy link

@jeefy jeefy commented Apr 28, 2024

Currently Nvidia's GPU Operator expects ldconfig.real to exist. See NVIDIA/nvidia-container-toolkit#147 for more info.

Short-term you can modify /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml to point to /sbin/ldconfig however any time the pods cycle or the node reboots it regenerates the file and points to the incorrect ldconfig.

@jeefy jeefy force-pushed the gpu-operator-fix branch from f738faa to c93eca3 Compare April 28, 2024 15:59
@jeefy jeefy changed the title symlink ldconfig to ldconfig.real for gpu-operator support fix: symlink ldconfig to ldconfig.real for gpu-operator support Apr 28, 2024
@bsherman
Copy link
Collaborator

bsherman commented Apr 28, 2024

Currently Nvidia's GPU Operator expects ldconfig.real to exist. See NVIDIA/nvidia-container-toolkit#147 for more info.

Short-term you can modify /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml to point to /sbin/ldconfig however any time the pods cycle or the node reboots it regenerates the file and points to the incorrect ldconfig.

Odd bug... but reading the report, seems to be an artifact of Ubuntu-first packaging support.

Thank you for the contribution @jeefy, but as this is nvidia specific (at least, it seems to be), I want to scope it a bit more.

What are your thoughts on making it a post-install step for our ucore nvidia RPM?
( see: https://github.com/ublue-os/ucore-kmods/blob/main/ublue-os-ucore-nvidia.spec )

AN alternative though, maybe add the symlink with comment and explanation here? https://github.com/ublue-os/ucore/blob/main/ucore/install-ucore-minimal.sh#L46

Edit: added alternative thought

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants