Skip to content

Sles15 sp7 alternative module path#123

Open
e4t wants to merge 4 commits intoopenSUSE:sles15-sp7from
e4t:sles15-sp7-alternative-module-path
Open

Sles15 sp7 alternative module path#123
e4t wants to merge 4 commits intoopenSUSE:sles15-sp7from
e4t:sles15-sp7-alternative-module-path

Conversation

@e4t
Copy link
Copy Markdown

@e4t e4t commented Sep 29, 2025

This pull request contains:

  1. The changes required to support a separate KMP directory in /lib/modules/kmp. These changes may not be sufficient.
  2. The changes required to find all potential update candidates even if the KMPs have 'versioned' module directory to avoid file conflicts whith multiversion support enabled.
  3. Changes required when merging 1. and 2. together.

This requires testing as pieces may be missing. This set includes the changes in request #121.

If module subdirectories are versioned, we need to search a bit harder
for potential update candiates as the name of the module subdirectory
differs.
We therefore search all weak-updates subdirectories of the $krel kernel
for a module module matching the same name (taking into accout different
compressions).
Should we find multiple modules that match (which should not happen!)
we flag this but continue regardless. This way, the first element in
the list will be taken as the potential update candidate.

Signed-off-by: Egbert Eich <eich@suse.com>
@hramrach
Copy link
Copy Markdown
Contributor

Why does the /kmp/ need to be added?

Shouldn't modules installed in /usr/lib/modules/nvdia-G04-175.25 already get scanned for modules?

If not why not?

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

Why does the /kmp/ need to be added?

Shouldn't modules installed in /usr/lib/modules/nvdia-G04-175.25 already get scanned for modules?

If not why not?

Please see my answer in #121 (comment) .

@hramrach
Copy link
Copy Markdown
Contributor

Why does the /kmp/ need to be added?
Shouldn't modules installed in /usr/lib/modules/nvdia-G04-175.25 already get scanned for modules?
If not why not?

Please see my answer in #121 (comment) .

That is completely irrelevant to the question.

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

Why does the /kmp/ need to be added?
Shouldn't modules installed in /usr/lib/modules/nvdia-G04-175.25 already get scanned for modules?
If not why not?

Please see my answer in #121 (comment) .

That is completely irrelevant to the question.

I beg you pardon?
It explains why we need (/usr)/lib/**kmp**/<kver> for module installation - instead of installing into (/usr)/lib/<kver>.

And no, /usr/lib/modules/<kver>/nvidia-G06-<modver_old> does not get scanned for potential updated candidates already, because __previous_version_of_kmp() uses symlink_to_module() to look for a potential update candidate. This, however, is too simplistic as it will return /usr/lib/modules/<kver>/nvdia-G06-<modver_present_package>.
This was addressed by the 2nd patch in the series - which was not part - and topic -of #121 - where you've asked exactly the same question!

@hramrach
Copy link
Copy Markdown
Contributor

I can't read your mind. I do not know what is in that /kmp directory.

So your design is to prepend /kmp to the kernel version when installing KMPs.

And my question is why do that instead of installing under a directory named after the KMP.

And yes, it is not clear why merely prepending /kmp is more likely to work than installing in a completely arbitrary directory under /lib/modules.

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

I can't read your mind. I do not know what is in that /kmp directory.

So your design is to prepend /kmp to the kernel version when installing KMPs.

And my question is why do that instead of installing under a directory named after the KMP.

We need to do both:

  1. Use (/usr)/lib/kmp/<kver> to avoid potentially having the same module multiple times in the same kernel '.../update' directory from different KMPs built against the same kernel.
    Example:
/lib/modules/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.144/nvidia-drm.ko.zst
/lib/modules/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.153.02/nvidia-drm.ko.zst
/lib/modules/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.172.08/nvidia-drm.ko.zst
  1. Use versioned module sub-directories to avoid file conflicts between different KMPs built against the same kernel:
    Example: versions 570.153.02 and 570.172.08 would use the same directory for the same files:
/lib/modules/kmp/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed/nvidia-drm.ko

And yes, it is not clear why merely prepending /kmp is more likely to work than installing in a completely arbitrary directory under /lib/modules.

Both are addressing a totally different issues. The first one (ie. 1. above) is explained in #121 (comment).
HTH

@hramrach
Copy link
Copy Markdown
Contributor

I still don't understand, at all.

How does inserting /kmp/ help anything?

How do you get

/lib/modules/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.144/nvidia-drm.ko.zst
/lib/modules/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.153.02/nvidia-drm.ko.zst
/lib/modules/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.172.08/nvidia-drm.ko.zst

if in your design you do not use multiversion, and as a result you can only have one nvidia-open-driver-G06-signed?

Why removing /6.4.0-150700.51-default/updates from the above would not work?

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

I still don't understand, at all.

How does inserting /kmp/ help anything?

How do you get

/lib/modules/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.144/nvidia-drm.ko.zst
/lib/modules/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.153.02/nvidia-drm.ko.zst
/lib/modules/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.172.08/nvidia-drm.ko.zst

if in your design you do not use multiversion, and as a result you can only have one nvidia-open-driver-G06-signed?

Where do I say, this is for a design without multiversion? I've been pointing out, that you will need all this - possibly more - to fix multiversion.
The alternative to fixing would be dropping it - which we are doing for the nvidia-open driver, now.

Why removing /6.4.0-150700.51-default/updates from the above would not work?

Once you install a package that's been built without multiversion(kernel) old packages will be removed and all is well. None of the above will be needed.
However, if we decide to keep it - and requiring undocumented hacks to remove it - we should fix it.
Even though we are planning to drop it for nvidia-open, since I've investigated the issue on weak-modules already, I'm provided these patches.

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

How does inserting /kmp/ help anything?

depmod searches in (/usr)/lib/modules/<kver> not in (/usr)/lib/modules/kmp/<kver>, doesn't it?

@hramrach
Copy link
Copy Markdown
Contributor

Nor in (/usr)/lib/modules/<kmpname>-<kmpver>

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

Nor in (/usr)/lib/modules/<kmpname>-<kmpver>

You mean like /usr/lib/modules/nvidia-open-driver-G06-signed-570.144?
Of course not. Nobody uses such a path at the moment. If I've used it above it was a typo.

@hramrach
Copy link
Copy Markdown
Contributor

But why not use this path that makes it clear what is installed in favor of the obscure kmp/ which also ties the installed modules to the specific implementation detail of using KMP as oppsed, say, DKMS?

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

But why not use this path that makes it clear what is installed in favor of the obscure kmp/ which also ties the installed modules to the specific implementation detail of using KMP as oppsed, say, DKMS?

Why would (/usr)/lib/modules/kmp/nvidia-open-driver-G06-signed-570.144 be more obscure than (/usr)/lib/modules/nvidia-open-driver-G06-signed-570.144?

Is (/usr)/lib/modules/<kmpname>-<kmpver> what DKMS does? I don't think so.
DKMS builds modules in /var/lib/dkms/<mod_name>/<mod_version>/build, installs them right into the kernel at (/usr)/lib/modules/<kver>/* it has built them for (ie <kver>) after backing up any pre-existing modules with the same name.
This is exactly what we don't want as we are not backing up things but leave them installed and use symlinks.

Where would the kernel version go in what you suggest?

  • Currently, we have (/usr)/lib/modules/<kver>/<module_name>.
  • With INSTALL_MOD_DIR we've used: (/usr)/lib/modules/<kver>/<module_name>-<module_ver>.
  • My proposal used: (/usr)/lib/modules/kmp/<kver>/<module_name>-<module_ver>.
  • Now, you propose: (/usr)/lib/modules/<module_name>-<module_ver>/<kver>?

@hramrach
Copy link
Copy Markdown
Contributor

or /lib/modules/<kmp_name>-<module_ver>-<kver>

The point is that people intentionally installed the nvidia package, they don't care if it was delivered through KMP or DKMS, and very few people actually know what that is. Putting the installation method before what is installed needlessly obfuscates what is installed.

DKMS does not actually work correctly on (open)SUSE but before Linux version 6 the nVidia modules were installed through some DKMS-like method without actually using DKMS which needlessly rebuilt the module for every kernel. If it installed the module to a separate directory and used wm2 it could skip most builds, and DKMS could too if it is ever integrated.

@e4t e4t force-pushed the sles15-sp7-alternative-module-path branch from 1e3a4c2 to ed01225 Compare September 29, 2025 16:30
@hramrach
Copy link
Copy Markdown
Contributor

I see what is wrong with this approach now.

The group that comes after /modules/ and with the change after /modules/kmp/ captures the kernel version.

But the out-of-kernel KMPs do not have a kernel version.

That's why prepending '/kmp' was seen as a 'solution' to make this code 'work' but it does not. The modules that are out of kernel should be versionless.

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

or /lib/modules/<kmp_name>-<module_ver>-<kver>

In other words, the nvidia_drm.ko.zst would then reside in:
/lib/modules/nvidia-open-driver-G06-signed-570.144-6.4.0-150700.51-default/updates/ - correct?
Why would this be better than: /lib/modules/kmp/6.4.0-150700.51-default/updates/nvidia-open-driver-G06-signed-570.144 ?

Large portions of the code rely on the fact that the directory structure is:

<prefix>/<kver>/updates/<kmp_name>/<module>.ko

with this, we will have to support another variant:

<prefix>/<module_name>-<kver>/updates/<module>.ko

The changes required to support both variants in weak-updates2 would be quite intrusive with a greater risk of breakages.

The point is that people intentionally installed the nvidia package, they don't care if it was delivered through KMP or DKMS, and very few people actually know what that is. Putting the installation method before what is installed needlessly obfuscates what is installed.

I'm not sure if people actually care.

DKMS does not actually work correctly on (open)SUSE but before Linux version 6 the nVidia modules were installed through some DKMS-like method without actually using DKMS which needlessly rebuilt the module for every kernel. If it installed the module to a separate directory and used wm2 it could skip most builds, and DKMS could too if it is ever integrated.

NVIDIA's proprietary driver used a home-baked build/install method on SUSE and DKMS everywhere else. As they also provide a package with the open driver, they now us DKMS everywhere - including SUSE.
DKMS used outside of package install wants to install in a ro file system in case transactional updates are enabled, so we'd have to fix it.
The way DKMS operates is quite different from what we are trying to achieve with multiversioned packages.

What we are are trying to achieve with multiversioned packages is not wrong - and the SolidDriver people seem to like it. It is just very broken in some respects: that we can only publish a single module per kernel release and are thus bound to one release every four weeks is a major user experience problem.
With my patches I was trying to mitigate this flaw. But at the same time, I decided to discontinue multiversion because I knew that changing anything will be a huge bike shed and we will not achieve anything in time while the next NVIDIA driver update is just a day away.
Since I've made these changes already I provided them so they don't get lost and my time has been wasted. If there are flaws in my code, I'm happy to look into them, catering to different perceptions of 'obscurity' is not on my charter.
If you feel things need to be done differently, feel free to do so. I've got other things on my plate.
Third option is: we leave everything as broken as it is.

@hramrach
Copy link
Copy Markdown
Contributor

Why /updates/? /lib/modules/nvidia-open-driver-G06-signed-570.144-6.4.0-150700.51-default is its own directory, it is not updating itself.

And supporting another variant is warranted, the modules that are installed out of kernel need to be handled differently.

That is the group that captures kernel version should be tightened to reject anything that does not start with at least two place version such as 6.0~some-non-numeric-stuff, and second match attempt done without capturing the version at all.

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

Why /updates/? /lib/modules/nvidia-open-driver-G06-signed-570.144-6.4.0-150700.51-default is its own directory, it is not updating itself.

Ok, it would make it easier to discover what scheme is used - the 'new' one under (/usr)/lib/modules/<versioned_module_name>-<kver> or the old one (/usr)/lib/modules/<kver>.

And supporting another variant is warranted, the modules that are installed out of kernel need to be handled differently.

Yes.

That is the group that captures kernel version should be tightened to reject anything that does not start with at least two place version such as 6.0~some-non-numeric-stuff, and second match attempt done without capturing the version at all.

I'm not fully sure if I understand, but we should be stricter on the requirements of the module directory. The weak link in the kernel should probably be in a directory (/usr)/lib/modules/<kver>/updates/<unversioned_module_name>/as this will avoid duplicate links and will make it easier to identify an update candidate. For this, we probably should adopt the convention that <unversioned_module_name> is held whereever needed in a variable module_basename or similar.

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 29, 2025

I see what is wrong with this approach now.

The group that comes after /modules/ and with the change after /modules/kmp/ captures the kernel version.

But the out-of-kernel KMPs do not have a kernel version.

Out-of-kernel KMPs do have a kernel version. It's the version they are built
against. The kernel version determines the ABI - which should be stable for an enterprise product version but may be not.

That's why prepending '/kmp' was seen as a 'solution' to make this code 'work' but it does not. The modules that are out of kernel should be versionless.

Why?

e4t added 3 commits September 30, 2025 08:33
This will install KMPs separately from the kernel they are built
for ('native kernel) using weak-updates for all kernel versions
- including the native one.
The weak-modules2 script already incorporates logic to create
weak updates for only one module version (i.e. the latest one)
per kernel version.
This way we can build multiple KMP versions against the same
kernel even with `multiversion(kernel)` set: the native kernel
will no longer see multiple driver modules installed leaving
it to 'depmod' to figure out which one to use.

Signed-off-by: Egbert Eich <eich@suse.com>
Signed-off-by: Egbert Eich <eich@suse.com>
@e4t e4t force-pushed the sles15-sp7-alternative-module-path branch from ed01225 to d8dd7df Compare September 30, 2025 07:10
@hramrach
Copy link
Copy Markdown
Contributor

hramrach commented Sep 30, 2025

Out-of-kernel KMPs do have a kernel version. It's the version they are built
against. The kernel version determines the ABI - which should be stable for an enterprise product version but may be not.

That's backwards. The ABI is determined by imported symbol CRCs which are checked by depmod, and determines on which kernels the module can be installed (those that export those symbold with those CRCs). That might or might not be the kernel the module was built against.

The version specifies the kernel from which the module cannot be removed because it's installed in its module directory.

In this piece of obfuscated code (sed -rn 's:^(/lib/modules)?/([^/]*)/(.*\.ko(\.[gx]z|\.zst)?)$:\1 \2 \3:p' ([^/]*) matches the kernel version. It should be replaced by something like ([0-9]+[.][0-9]+[^/]*) to only match something that looks like a kernel version an not /kmp/ or /nvidia-open-driver-G06-signed-570.144-6.4.0-150700.51-default/.

There are two types of modules that wm2 recognizes:

^(/lib/modules)?/([^/]*)/(update/.*\.ko(\.[gx]z|\.zst)?)$ -> module part of out-of-tree KMP that is to be handled by wm2
^(/lib/modules)?/([^/]*)/(.*\.ko(\.[gx]z|\.zst)?)$ -> when not matched by above -> in-tree module, ignored

It is not clear why /lib/modules is optional, possibly a bug.

Now third type of module is needed \.ko(\.[gx]z|\.zst)?)$ -> when not matched by above -> versionless KMP module

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 30, 2025

Ok, now I understand what you mean.
You are trying to solve a much bigger challenge than I did with the original patch.
You would like to go the full step: by no longer installing a KMP into it's 'native' kernel ('native' referring to the kernel it's been built with), we don't need to associate this module with a particular kernel version as we already rely on the on the ABI checker to determine what kernels a module is compatible with.
This is definitely a valid approach but a much bigger step - not suitable for a maintenance update - at least not without extensive testing and possibly an ECO as we need to alert others - not because it is complicated but because it is error prone. What makes it even more so is that we still need to support the old scheme.
Note on the side: we still need to be able to determine the kernel a KMP has been built with simply for tracability: for example, the KMP may have picked up something from it's 'native' kernel (for instance through a macro) that later gets proven to be problematic.

@hramrach
Copy link
Copy Markdown
Contributor

hramrach commented Sep 30, 2025

You are installing it outside of the kernel directory so you do need to solve that.

Or add yet another special case which is no better, probably even worse.

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 30, 2025

In this piece of obfuscated code (sed -rn 's:^(/lib/modules)?/([^/])/(..ko(.[gx]z|.zst)?)$:\1 \2 \3:p' ([^/]) matches the kernel version. It should be replaced by something like ([0-9]+[.][0-9]+[^/]) to only match something that looks like a kernel version an not /kmp/ or /nvidia-open-driver-G06-signed-570.144-6.4.0-150700.51-default/.

But you said we don't want to associate a module from a KMP with a kernel version any more, therefore, the kernel version would not appear in a module install path.

There are two types of modules that wm2 recognizes:

^(/lib/modules)?/([^/])/(update/..ko(.[gx]z|.zst)?)$ -> module part of out-of-tree KMP that is to be handled by wm2
^(/lib/modules)?/([^/])/(..ko(.[gx]z|.zst)?)$ -> when not matched by above -> in-tree module, ignored

This still refers to above sed line from driver-check.sh?
If we really want to stop associating KMPs with kernel versions much of the entire test is moot. We still need to keep it for 'old style' modules, but add a 2nd loop testing for the new scheme.

It is not clear why /lib/modules is optional, possibly a bug.

Right. I've been wondering about this as well...

@hramrach
Copy link
Copy Markdown
Contributor

Why is it moot? You can make the krel group optional, swap \1 \2 \3 to \1 \3 \2 and then prefix krel path to prefix path krel and use the parts that are relevant.

@e4t
Copy link
Copy Markdown
Author

e4t commented Sep 30, 2025

Why is it moot? You can make the krel group optional, swap \1 \2 \3 to \1 \3 \2 and then prefix krel path to prefix path krel and use the parts that are relevant.

You don't check $krel and your $path test would be different. Just the test for $prefix makes sense unaltered. This is why I suggest to do an entirely separate loop for this case.
But these are implementation details we don't need to discuss piece by piece here.

@mwilck
Copy link
Copy Markdown
Contributor

mwilck commented Oct 2, 2025

@hramrach

But the out-of-kernel KMPs do not have a kernel version.

That's why prepending '/kmp' was seen as a 'solution' to make this code 'work' but it does not. The modules that are out of kernel should be versionless.

I am not getting this argument. As @e4t wrote, of course they do. It's part of their package version.

You're right that if we install the modules outside the tree of the kernel they were built for, we don't have to include the kernel version in the installation path.

But if we don't, multiversion(kernel) will obviously cease to work, because the same module compiled for different kernels will use the same installation path. We will get a file conflict.

Thus if we want to continue using multiversion(kernel), we have to include the kernel version in the path.

Or what am I overlooking here?

@mwilck
Copy link
Copy Markdown
Contributor

mwilck commented Oct 2, 2025

As we're talking about weak-modules2, I would prefer a patch set that doesn't mandate any specific installation path. As far as wm2 is concerned, any path should be fine.

The policy/convention where to install the modules, if we don't install them under {/usr,}/lib/modules/$kver/updates, should be set elsewhere.

@hramrach
Copy link
Copy Markdown
Contributor

hramrach commented Oct 2, 2025

Yes, exactly. The kernel version should be included in the path somewhere to avoid the conflict but that does not concern wm2. The only requirement is that the KMP is NOT installed someplace like /lib/modules/KVER where KVER looks like something that could potentially be a valid kernel version because in that case it's considered the old style KMP.

@e4t
Copy link
Copy Markdown
Author

e4t commented Oct 7, 2025

So the idea is to support both 'old style' KMPs which install into /lib/modules/${kver} and 'new' style modules that may install 'anywhere else' but into /lib/modules/${kver}.
This 'anywhere else' would require another argument to be passed to module-init-tools/kernel-scriptlets/kmp-script, module-init-tools/weak-modules2 but also %suse_kernel_module_package() as well as %kernel_module_package() in the kernel macros as the weak-module2 script can no longer make assumptions about the install location.

@mwilck
Copy link
Copy Markdown
Contributor

mwilck commented Oct 7, 2025

I don't think that a new argument is required. wm2 should be able to work transparently with either KMP type.

I am currently working on a patch set. The current status (yet untested) is in the PED-12049 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants