-
Notifications
You must be signed in to change notification settings - Fork 437
Generate CDI specs for coherent and non-coherent devices #1247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the generation of separate CDI specifications for coherent and non-coherent GPU devices. The implementation adds device-level annotations to indicate coherence status and splits the generated CDI spec accordingly.
- Adds coherence annotations to GPU device specifications
- Generates separate CDI specs for coherent/non-coherent devices in addition to the base spec
- Introduces feature flag to control coherence annotation behavior
Reviewed Changes
Copilot reviewed 9 out of 24 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/nvcdi/options.go | Adds WithFeatureFlags function and deprecates single WithFeatureFlag |
| pkg/nvcdi/api.go | Defines new FeatureDisableCoherentAnnotations feature flag |
| pkg/nvcdi/full-gpu-nvml.go | Implements device coherence annotation logic |
| pkg/nvcdi/lib-nvml.go | Updates function calls to pass feature flags |
| pkg/nvcdi/mig-device-nvml.go | Disables coherent annotations for MIG devices |
| cmd/nvidia-ctk/cdi/generate/generate.go | Implements spec splitting logic and multi-spec generation |
| cmd/nvidia-ctk/cdi/generate/generate_test.go | Updates tests for new multi-spec generation |
| deployments/devel/go.mod | Removes golangci-lint dependency |
| deployments/devel/tools.go | Removes golangci-lint import |
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
|
We should add an attribute to the k8s-dra-driver for this too. |
c528d0f to
113317b
Compare
jgehrcke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a 'trust approval', in case you need to move forward for the next release candidate. I am not sure if we have consensus in the team to use these types of approvals -- but I personally think that sometimes there's a time for them (very open to discussing this arguably controversial perspective).
ArangoGutierrez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
113317b to
1dea726
Compare
1dea726 to
48286b9
Compare
|
/ok to test |
48286b9 to
d0c4c96
Compare
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.9-0 to 0.13.0-0. - [Release notes](https://github.com/NVIDIA/go-nvml/releases) - [Commits](NVIDIA/go-nvml@v0.12.9-0...v0.13.0-0) --- updated-dependencies: - dependency-name: github.com/NVIDIA/go-nvml dependency-version: 0.13.0-0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
With this change the nvidia-ctk cdi generate command generates CDI specs based on whether a device supports coherent access to system memory or not. In this case "regular" nvidia.com/gpu CDI specs are generated for all devices as well as nvidia.com/gpu.coherent and nvidia.com/gpu.noncoherent for devices that are either coherent or non-coherent. Adding the --feature-flag=disable-coherent-annotations command line argument to the nvidia-ctk cdi generate command will disable this. The "disable-coherent-annotations" feature flag can also be set in the nvcdi API in which case the generated CDI device specification will not include annotations indicating coherence. Signed-off-by: Evan Lezar <[email protected]>
d0c4c96 to
868963b
Compare
|
Is this supposed to be supported with device-plugin as well. |
This change disables the functionality for splitting generated CDI specifications based on device coherence by default. This was added in NVIDIA#1247, but due to discussions around whether coherence is an property that should be exposed, we are disabling this by default. Note that users can opt in to the feature by running the `nvidia-ctk cdi generate` with the `--feature-flag=enable-coherent-annotations` command line flag. Alternatively the `nvidia-ctk cdi generate` command can be run with the `NVIDIA_CTK_CDI_GENERATE_FEATURE_FLAGS` enviroment set to include "enable-coherent-annotations" (in a comma-separated list). Signed-off-by: Evan Lezar <[email protected]>
This change updates CDI spec generation to generation specifications for coherent and non-coherent devices. At its core, this is acheived by adding device-level annotations to the CDI specification for devices indicating whether a device is coherent or not. The
nvidia-ctk cdi generatecommand then splits the generated spec by the specified annotation and generates additional specs fornvidia.com/gpu.coherentand / ornvidia.com/gpu.noncoherentdevices.For example, when running on a device that supports coherence we see the following:
With the only difference between the two specs being the device kind:
The following specs are generated:
This is blocked by: