-
Notifications
You must be signed in to change notification settings - Fork 18
Update vfio-manage to choose best VFIO driver #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 19655564782Details
💛 - Coveralls |
fcfa6cf to
7829c7f
Compare
internal/nvpci/nvpci.go
Outdated
| modAliasPath := filepath.Join(d.Path, "modalias") | ||
| modAliasContent, err := os.ReadFile(modAliasPath) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to read modalias file for %s: %w", d.Address, err) | ||
| } | ||
|
|
||
| modAliasStr := strings.TrimSpace(string(modAliasContent)) | ||
| modAlias, err := parseModAliasString(modAliasStr) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to parse modalias string %q for device %q: %w", modAliasStr, d.Address, err) | ||
| } | ||
| logrus.Debugf("modalias for device %q: %+v", d.Address, modAlias) | ||
|
|
||
| kernelVersion, err := getKernelVersion() | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to get kernel version: %w", err) | ||
| } | ||
| logrus.Debugf("kernel version: %s", kernelVersion) | ||
|
|
||
| modulesAliasFilePath := filepath.Join("/lib/modules", kernelVersion, "modules.alias") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we extract our file paths into constants?
const (
kernelModulesRoot = "/lib/modules"
modulesAliasFileName = "modules.alias"
)
We can also create helper functions to create the paths:
func getModulesAliasPath(kernelVersion string) string {
return filepath.Join(kernelModulesRoot, kernelVersion, modulesAliasFileName)
}
func getDeviceModaliasPath(devicePath string) string {
return filepath.Join(devicePath, "modalias")
}
This way, callers don't need to know the exact path structure.
internal/nvpci/modalias.go
Outdated
| if matches, score := matchField(deviceModAlias.vendor, patternModAlias.vendor); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.device, patternModAlias.device); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.subvendor, patternModAlias.subvendor); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.subdevice, patternModAlias.subdevice); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.baseClass, patternModAlias.baseClass); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.subClass, patternModAlias.subClass); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.interface_, patternModAlias.interface_); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| return true, specificity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use slices of getters here to shorten this code?
type fieldGetter func(*modAlias) string
fields := []struct {
getter fieldGetter
}{
{func(m *modAlias) string { return m.vendor }},
{func(m *modAlias) string { return m.device }},
// ... etc
}
for _, field := range fields {
deviceVal := field.getter(deviceModAlias)
patternVal := field.getter(patternModAlias)
if matches, score := matchField(deviceVal, patternVal); !matches {
return false, 0
}
specificity += score
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the implementation significantly. I believe it is simpler now. Let me know what you think.
| for _, line := range lines { | ||
| line = strings.TrimSpace(line) | ||
|
|
||
| if !strings.HasPrefix(line, "alias vfio_pci:") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract this to a named constant?
ff28e1c to
b5da7d9
Compare
b5da7d9 to
a95b741
Compare
Rather than always binding GPUs to the vfio-pci driver, this commit
introduces logic to see if the running kernel has a VFIO variant
driver available that is a better match for the device. This is required
on Grace-based systems where the nvgrace_gpu_vfio_pci module is required
to be used in favor of the vfio-pci module.
We read the mod.alias file for a given device, then we look through
/lib/modules/${kernel_version}/modules.alias for the vfio_pci alias
that matches with the least number of wildcard ('*') fields.
The code introduced in this commit is inspired by:
https://gitlab.com/libvirt/libvirt/-/commit/82e2fac297105f554f57fb589002933231b4f711
Signed-off-by: Christopher Desiniotis <[email protected]>
a95b741 to
ad775b1
Compare
| // extractField extracts the value before the next delimiter from the input string. | ||
| // Returns the extracted value, the remaining string (without the delimiter), and any error. | ||
| func extractField(input, delimiter string) (string, string, error) { | ||
| idx := strings.Index(input, delimiter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be other methods in the strings package you could use to simplify this further
| subdevice string // sd | ||
| baseClass string // bc | ||
| subClass string // sc | ||
| interface_ string // i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| interface_ string // i | |
| pciDevInterface string // i |
OR
| interface_ string // i | |
| deviceInterface string // i |
Rather than always binding GPUs to the vfio-pci driver, this commit
introduces logic to see if the running kernel has a VFIO variant
driver available that is a better match for the device. This is required
on Grace-based systems where the nvgrace_gpu_vfio_pci module is required
to be used in favor of the vfio-pci module.
We read the mod.alias file for a given device, then we look through
/lib/modules/${kernel_version}/modules.alias for the vfio_pci alias
that matches with the least number of wildcard ('*') fields.
The code introduced in this commit is inspired by:
https://gitlab.com/libvirt/libvirt/-/commit/82e2fac297105f554f57fb589002933231b4f711
Depends on #127
Testing
On a GB200 compute tray:
On a system with one L40 (configured in graphics mode) and one L4 GPU: