Skip to content

Conversation

@klueska
Copy link
Contributor

@klueska klueska commented Jan 7, 2025

This pulls in a temporary change to ensure that the CI created within a GI is always of maximum size. It is a WIP because the change in go-nvlib is not yet finalized, so we can't actually vendor in the solution yet.

It works by reversing the loop that walks through CIs to ensure that we visit up any "newer" CI profiles before visiting older ones. The assumption being that newer ones may provide a CI definition that has a larger memory slice count with the same compute slice count. Unfortunately, we don't have a way to distinguish this in the canonical naming convention, so the same names refers to both MIG devices -- hence the bug.

We need a more robust / comprehensive solution to this issue, possibly introducing a "custom" naming convention to distibguish the cases.

@klueska klueska marked this pull request as draft January 7, 2025 13:05
This pulls in a temporary change to ensure that the CI created within a
GI is always of maximum size. It is a WIP because the change in go-nvlib
is not yet finalized, so we can't actually vendor in the solution yet.

It works by reversing the loop that walks through CIs to ensure that we
visit up any "newer" CI profiles before visiting older ones. The
assumption being that newer ones may provide a CI definition that has a
larger memory slice count with the same compute slice count.
Unfortunately, we don't have a way to distinguish this in the canonical
naming convention, so the same names refers to both MIG devices -- hence
the bug.

We need a more robust / comprehensive solution to this issue, possibly
introducing a "custom" naming convention to distibguish the cases.

Signed-off-by: Kevin Klues <[email protected]>
@klueska klueska changed the title WIP: Update vendoring from go-nvlib WIP: Temporary workaround to ensure biggest CI created inside GI Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants