Skip to content

Conversation

@elezar
Copy link
Member

@elezar elezar commented Nov 11, 2025

This change fixes the update of the ldcache when running a non-Debian-like (e.g. RHEL, Fedora) container on a Debian-like (e.g. Debian, Ubuntua) system. Here the ldcache in the container does not initially include system libraries from /lib64 or /usr/lib64 since this ldconfig from the host that is executed in the container's namespace does not process those paths by default.

Testing

The expected behaviour when using runc directly:

$ docker run --rm --runtime=runc redhat/ubi9 bash -c "ldconfig -p | grep libc.so."
        libc.so.6 (libc6,AArch64, OS ABI: Linux 3.7.0) => /lib64/libc.so.6

The current behaviour:

$ docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all redhat/ubi9 bash -c "ldconfig -p | grep libc.so."

(note that libc.so.6 is not present in the ldcache since /lib64 (linked to /usr/lib64) is not processed).

The updated behaviour:

$ docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all redhat/ubi9 bash -c "ldconfig -p | grep libc.so."
        libc.so.6 (libc6,AArch64, OS ABI: Linux 3.7.0) => /lib64/libc.so.6

@elezar elezar added this to the v1.18.1 milestone Nov 11, 2025
@elezar
Copy link
Member Author

elezar commented Nov 11, 2025

/cherry-pick release-1.18

Copy link
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments / questions, but no blockers.

@elezar elezar force-pushed the fix-ldcache-update-on-non-debian branch 2 times, most recently from 2794eaa to 0b28791 Compare November 14, 2025 10:49
func (l *Ldconfig) getSystemSerachPaths() []string {
func (l *Ldconfig) getSystemSearchPaths() []string {
if l.isDebianLikeContainer {
debianSystemSearchPaths()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been debugging issues on my homelab cluster with the latest CTK release, and I just spotted this code here. I think it's missing a return. I can send a PR for the existing code, or we can roll that fix into this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1457

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. In order to minimize backport churn, let's add the commit to this PR.

@jfroy
Copy link
Collaborator

jfroy commented Nov 17, 2025

I'm preparing a follow up / alternative to this patch to handle additional cases I've ran into recently. Should have it out as a branch in an hour or so.

cdesiniotis
cdesiniotis previously approved these changes Nov 18, 2025
@elezar elezar mentioned this pull request Nov 18, 2025
@elezar elezar force-pushed the fix-ldcache-update-on-non-debian branch 2 times, most recently from 7f73f39 to 22d7b5d Compare November 19, 2025 10:32
@elezar elezar changed the title Fix ldcache update on non debian systems Fix ldcache update when host and container distributions do not match Nov 19, 2025
@elezar elezar force-pushed the fix-ldcache-update-on-non-debian branch 2 times, most recently from d3dcb86 to 1e52761 Compare November 19, 2025 13:51
@elezar elezar force-pushed the fix-ldcache-update-on-non-debian branch from 1e52761 to 8e56cf7 Compare November 20, 2025 10:03
@ArangoGutierrez ArangoGutierrez self-requested a review November 20, 2025 10:26
@elezar elezar requested a review from cdesiniotis November 20, 2025 11:12
@elezar elezar dismissed stale reviews from cdesiniotis and ArangoGutierrez November 20, 2025 11:12

Additional commits.

Comment on lines +386 to +393
// TODO: Add other architectures that have custom `add_system_dir` macros (e.g. riscv)
// TODO: Replace with executing the container's dynamlic linker with `--list-diagnostics`?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we merge this PR we should turn this into GitHub issues to track this TODO's and get them implemented

Comment on lines +433 to +441
// TODO: Add other architectures that have custom `add_system_dir` macros (e.g. riscv)
// TODO: Replace with executing the container's dynamlic linker with `--list-diagnostics`?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we merge this PR we should turn this into GitHub issues to track this TODO's and get them implemented

@ArangoGutierrez
Copy link
Collaborator

Thanks a lot for your contributions on this topic @jfroy !

ldsoconfdFilenamePattern = "00-nvcr-*.conf"
// ldsoconfdSystemDirsFilenamePattern specifies the filename pattern for the drop-in conf file
// that includes the expected system directories for the container.
// This is chosen to have a high likelihood of being lexographically last in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lexicographically" :)

We could also use the prefix "yolo-" 🤡

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was me. Updating this and references to Debian and non-Debian for clarity.

Copy link

@jgehrcke jgehrcke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fascinating, thank you.

non-debian (including ubuntu)

From the perspective of this PR, Ubuntu is actually non-Debian?

@elezar
Copy link
Member Author

elezar commented Nov 20, 2025

Fascinating, thank you.

non-debian (including ubuntu)

From the perspective of this PR, Ubuntu is actually non-Debian?

No. Ubuntu is debian in this context. Let me update the description to be clearer.

@jfroy
Copy link
Collaborator

jfroy commented Nov 20, 2025

Fascinating, thank you.

non-debian (including ubuntu)

From the perspective of this PR, Ubuntu is actually non-Debian?

@elezar is that what you meant (I think this is quoting #1444 (comment))?

@jgehrcke Ubuntu is "debian-like" because /etc/debian_version exists on Ubuntu systems. This PR does not change that.

This change ensures that updating the ldcache in a container also
includes the system search paths for the container ldconfig.

In most cases, the hook will be executing a host ldconfig that may be
configured widely differently from what the container image expects. The
common case is Debian-like (e.g. Debian, Ubuntu) vs non-Debian-like
(e.g. RHEL, Fedora). But there are also hosts that configure ldconfig
to search in a glibc prefix (e.g. /usr/lib/glibc). To avoid all these
cases, write the container's expected system search paths to a drop-in
conf file that is likely to be last in lexicographic order.
Entries in the top-level ld.so.conf file may be processed after this
drop-in, but this hook does not modify the top-level file if it exists.

Signed-off-by: Jean-Francois Roy <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the fix-ldcache-update-on-non-debian branch from 8e56cf7 to 6592021 Compare November 20, 2025 15:49
@elezar
Copy link
Member Author

elezar commented Nov 20, 2025

@elezar is that what you meant (I think this is quoting #1444 (comment))?

Yes, the code also used Debian and non-Debian in comments (and a function name). I have updated to use Debian-like and non-Debian-like in the comments with examples of distros to make this clearer.

@elezar elezar merged commit dcb05d0 into NVIDIA:main Nov 20, 2025
30 of 33 checks passed
@github-actions
Copy link

🤖 Backport PR created for release-1.18: #1488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants