fix(chart): derive ld.so.preload from devicePlugin.libPath to fix non-default path deployments#1714
Conversation
…-default path deployments
vgpu-init.sh copies /k8s-vgpu/lib/nvidia/ld.so.preload to devicePlugin.libPath on the host,
but the file in the image hardcodes /usr/local/vgpu/libvgpu.so regardless of the configured
libPath. On systems where libPath must be changed (e.g. Bottlerocket EKS nodes where /usr/local
is read-only), the copied ld.so.preload points to the wrong path, causing libvgpu.so to fail
to preload in every workload container after a device plugin pod restart.
Add ld.so.preload as a second data key in the existing device-plugin ConfigMap, rendered from
{{ .Values.devicePlugin.libPath }}/libvgpu.so. Mount it into the device-plugin container at
/k8s-vgpu/lib/nvidia/ld.so.preload using subPath on the existing deviceconfig volume. The
vgpu-init.sh MD5-based copy logic then picks up the correct path from the ConfigMap instead
of the image's hardcoded value.
Fixes Project-HAMi#1713
Signed-off-by: ilia-medvedev <ilia.medvedev@gong.io>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ilia-medvedev The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @ilia-medvedev! It looks like this is your first PR to Project-HAMi/HAMi 🎉 |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request modifies the Hami device plugin Helm chart. It adds an ld.so.preload entry to configmap.yaml, which specifies the path to libvgpu.so using a Helm value. Correspondingly, daemonsetnvidia.yaml is updated to mount this ld.so.preload configuration into the container at /k8s-vgpu/lib/nvidia/ld.so.preload as a read-only volume. This change likely enables the preloading of libvgpu.so for the NVIDIA device plugin. There is no feedback to provide as no review comments were made.
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
Thanks for the fix, have you tested in your cluster? |
|
@archlitchi I have indeed. Works. |
Fixes #1713, #971
vgpu-init.shcopies/k8s-vgpu/lib/nvidia/ld.so.preloadtodevicePlugin.libPathon thehost, but the file in the image hardcodes
/usr/local/vgpu/libvgpu.so. WhenlibPathis setto anything else (e.g.
/var/lib/hami/vgpuon Bottlerocket EKS where/usr/localisread-only), the copied file points to the wrong path and
libvgpu.sofails to preload inworkload containers on every device plugin restart.
Fix: add
ld.so.preloadas a data key in the existing device-plugin ConfigMap, renderedfrom
{{ .Values.devicePlugin.libPath }}/libvgpu.so, and mount it at/k8s-vgpu/lib/nvidia/ld.so.preloadviasubPathon the existingdeviceconfigvolume.vgpu-init.sh's MD5 check then picks up the correct path from the ConfigMap instead of thehardcoded image value. No new Kubernetes resources needed.