Skip to content

Conversation

@tariq1890
Copy link
Contributor

@tariq1890 tariq1890 commented Nov 22, 2024

This change condenses the two-step driver install into a single step.

Currently, the driver image installs the userspace components and kernel modules separately. This was to allow for signing of the kernel modules with a custom private key and then relinking the signed kernel modules as well as updating the Kernel module should the underlying kernel host be updated.

As none of these workflow apply today, we simplify the driver installation and allow for defining an API in gpu-operator where users can easily pass custom runfile installation arguments

I have tested Driver upgrades and updates (both Open and ClosedRM modules) with these changes and the driver container has been running with no issues

@tariq1890 tariq1890 marked this pull request as draft November 22, 2024 02:20
@tariq1890 tariq1890 marked this pull request as ready for review November 22, 2024 02:53
--x-module-path=/tmp/null \
--x-library-path=/tmp/null \
--x-sysconfig-path=/tmp/null \
-m=${KERNEL_TYPE} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR -- looking at this again, it is probably best to not explicitly set this option unless the user sets KERNEL_TYPE (or whatever API is exposed by the operator). Especially now that we are leveraging the nvidia-installer to perform the compilation and installation, we can depend on the defaults the nvidia-installer applies for KERNEL_TYPE -- for example, on newer driver versions it will automatically choose to install the open modules on compatible systems.

@tariq1890 tariq1890 force-pushed the single-step-install branch 2 times, most recently from b4041f6 to 38a6df2 Compare November 23, 2024 01:50
@tariq1890 tariq1890 force-pushed the single-step-install branch 2 times, most recently from 6c9414b to 11b3b8e Compare November 25, 2024 20:16
@tariq1890 tariq1890 merged commit ad158a0 into main Nov 26, 2024
7 checks passed
@tariq1890 tariq1890 deleted the single-step-install branch November 26, 2024 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants