Skip to content

Conversation

@elezar
Copy link
Member

@elezar elezar commented Jun 5, 2025

Following the refactoring of device request extraction, we can now make CDI device requests consistent with other methods.

This change moves to using image.VisibleDevices instead of separate calls to CDIDevicesFromMounts and VisibleDevicesFromEnvVar. This also changes the way in which annotation requests are handled to be consistent with other mechanisms by including annotation devices in the list of VisibleDevices.

See:


// DevicesFromMounts returns a list of device specified as mounts.
func (i CUDA) DevicesFromMounts() []string {
// requestsFromMounts returns a list of device specified as mounts.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this function since we're returning all requests including:

  • device requests (legacy and CDI)
  • imex channel requests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: DevicesFromMounts has been renamed and turned private to the config pkg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It was not used outside this package.

@ArangoGutierrez ArangoGutierrez added this to the v1.18.0 milestone Jun 5, 2025
@ArangoGutierrez ArangoGutierrez self-requested a review June 5, 2025 14:38
@elezar elezar force-pushed the make-cdi-device-extraction-consistent branch from 5d667fc to 0e11147 Compare June 5, 2025 15:09
@coveralls
Copy link

coveralls commented Jun 5, 2025

Pull Request Test Coverage Report for Build 15635726706

Details

  • 63 of 130 (48.46%) changed or added relevant lines in 4 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.1%) to 33.691%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/runtime/runtime_factory.go 25 32 78.13%
internal/config/image/cuda_image.go 23 50 46.0%
internal/modifier/cdi.go 5 38 13.16%
Files with Coverage Reduction New Missed Lines %
internal/config/image/cuda_image.go 1 72.77%
internal/modifier/cdi.go 1 7.83%
Totals Coverage Status
Change from base Build 15585515442: 0.1%
Covered Lines: 4297
Relevant Lines: 12754

💛 - Coveralls

@elezar elezar force-pushed the make-cdi-device-extraction-consistent branch 2 times, most recently from 58c6c17 to 1b0f07a Compare June 5, 2025 19:47
@elezar elezar self-assigned this Jun 5, 2025
@ArangoGutierrez ArangoGutierrez requested a review from Copilot June 6, 2025 13:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors CDI device request handling to use image.VisibleDevices uniformly, adds a new cdiModifier wrapper for spec-based retrieval, and updates related tests.

  • Introduce cdiModifier to centralize spec parsing and device extraction
  • Replace direct calls to CDIDevicesFromMounts/VisibleDevicesFromEnvVar with VisibleDevices
  • Update tests in internal/modifier, internal/info, and internal/config to reflect the new behavior

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/modifier/cdi.go Add cdiModifier type, consolidate device extraction
internal/modifier/cdi_test.go New tests for getDevicesFromSpec
internal/info/auto_test.go Adjust TestResolveAutoMode expectations and setup
internal/config/image/cuda_image.go Rename/move mount helpers, parse CDI mount requests
internal/config/image/cuda_image_test.go Update expected mount-based CDI device inclusion
Comments suppressed due to low confidence (1)

internal/config/image/cuda_image.go:219

  • The mount-based CDI device checks were removed, so OnlyFullyQualifiedCDIDevices no longer accounts for devices requested via mounts. Consider re-adding logic using requestsFromMounts and parsing cdiDeviceMountRequest to correctly detect mount-based CDI devices.
func (i CUDA) OnlyFullyQualifiedCDIDevices() bool {

case strings.HasPrefix(device, volumeMountDevicePrefixCDI):
name, err := cdiDeviceMountRequest(device).qualifiedName()
if err != nil {
i.logger.Warningf("Ignoring invalid mount request for CDI device %v: %w", device, err)
Copy link

Copilot AI Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The %w verb is only supported by fmt.Errorf. In logging calls, use %v or %s to ensure the error is formatted correctly.

Suggested change
i.logger.Warningf("Ignoring invalid mount request for CDI device %v: %w", device, err)
i.logger.Warningf("Ignoring invalid mount request for CDI device %v: %v", device, err)

Copilot uses AI. Check for mistakes.
Comment on lines 69 to 80
cdiModifier := &cdiModifier{
logger: logger,
acceptDeviceListAsVolumeMounts: cfg.AcceptDeviceListAsVolumeMounts,
acceptEnvvarUnprivileged: cfg.AcceptEnvvarUnprivileged,
annotationPrefixes: cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.AnnotationPrefixes,
defaultKind: cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.DefaultKind,
}
return cdiModifier.getDevicesFromSpec(ociSpec)
}

// TODO: We should rename this type.
type cdiModifier struct {
Copy link

Copilot AI Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The cdiModifier type has a TODO to rename it. Consider selecting a more descriptive name or removing the TODO if this is the intended name.

Suggested change
cdiModifier := &cdiModifier{
logger: logger,
acceptDeviceListAsVolumeMounts: cfg.AcceptDeviceListAsVolumeMounts,
acceptEnvvarUnprivileged: cfg.AcceptEnvvarUnprivileged,
annotationPrefixes: cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.AnnotationPrefixes,
defaultKind: cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.DefaultKind,
}
return cdiModifier.getDevicesFromSpec(ociSpec)
}
// TODO: We should rename this type.
type cdiModifier struct {
deviceHandler := &CDIDeviceHandler{
logger: logger,
acceptDeviceListAsVolumeMounts: cfg.AcceptDeviceListAsVolumeMounts,
acceptEnvvarUnprivileged: cfg.AcceptEnvvarUnprivileged,
annotationPrefixes: cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.AnnotationPrefixes,
defaultKind: cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.DefaultKind,
}
return deviceHandler.getDevicesFromSpec(ociSpec)
}
// This type handles CDI-related device configurations and annotations.
type CDIDeviceHandler struct {

Copilot uses AI. Check for mistakes.
@elezar elezar force-pushed the make-cdi-device-extraction-consistent branch from 1b0f07a to d68d401 Compare June 11, 2025 12:52
elezar added 7 commits June 13, 2025 14:05
This change updates the image.CUDA type to also extract CDI
device requests. These are only relevant IF CDI prefixes are
specifically set.

Signed-off-by: Evan Lezar <[email protected]>
Following the refactoring of device request extraction, we can
now make CDI device requests consistent with other methods.

This change moves to using image.VisibleDevices instead of
separate calls to CDIDevicesFromMounts and VisibleDevicesFromEnvVar.

Signed-off-by: Evan Lezar <[email protected]>
This change includes annotation devices in CUDA.VisibleDevices
with the highest priority. This allows for the CDI device
request extraction to be consistent across all request mechanisms.

Note that this does change behaviour in the following ways:
1. Annotations are considered when resolving the runtime mode.
2. Incorrectly formed device names in annotations are no longer treated as an error.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the make-cdi-device-extraction-consistent branch from d68d401 to 8be03cf Compare June 13, 2025 13:28
Copy link

@jgehrcke jgehrcke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Cool. I didn't really look at this and don't fully understand this. But I am sure it's good work and I support you and trust that overall it's best to move forward! Also, there are tests so what can possibly go wrong? :)

return requestedDevice, nil
}

parts := strings.SplitN(requestedDevice, "/", 3)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this trying to parse vendor, class, device from

/cdi/<vendor>/<class>/<device>

?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, volumeMountDevicePrefixCDI = "cdi/".

and the argument 3 probably means "at most two splits", i.e. max three tokens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a mounted CDI device will have the path /var/run/nvidia-container-devices/cdi/vendor.com/class/device or /var/run/nvidia-container-devices/cdi/vendor.com/class=device. This code handles the former case where we split vendor.com/class/device into AT MOST three parts.

From the SplitN docs:

// SplitN slices s into substrings separated by sep and returns a slice of
// the substrings between those separators.
//
// The count determines the number of substrings to return:
// - n > 0: at most n substrings; the last substring will be the unsplit remainder;
// - n == 0: the result is nil (zero substrings);
// - n < 0: all substrings.

@elezar elezar merged commit bdcdcb7 into NVIDIA:main Jun 13, 2025
16 checks passed
@elezar elezar deleted the make-cdi-device-extraction-consistent branch June 13, 2025 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants