Skip to content

Conversation

@nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented Nov 21, 2025

This PR lays the groundwork for the example driver to be able to demonstrate several different kinds of devices, each implemented as a separate "profile." The changes mostly involve extracting the GPU-specific details and exposing handles to plug in different logic.

To see how this works with multiple profiles, I have a POC based on these changes implementing a new gpupart profile which demonstrates partitionable devices: nojnhuh/dra-example-driver@profiles...gpupart

Fixes #92

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nojnhuh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 21, 2025
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 21, 2025

I wanted to open this PR before I'm off on vacation next week, but I intend to take a look at #128 first and incorporate those changes here before merging this. FYI @guptaNswati

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 21, 2025
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 21, 2025

@sunya-ch If you could take a look to see if your consumable capacity changes are able to align with this that would be great!

@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Nov 24, 2025
Copy link

@sunya-ch sunya-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojnhuh
Could we define a DeviceProfile interface/struct with abstract methods like GetDevices, ApplyConfig, and ValidateConfig, and then keep a mapping from profile name to its implementation?
If we take that approach, I could implement ConsumableNetDevProfile or ConsumableGPUProfile independently. What do you think?

resources: {}

kubeletPlugin:
# numDevices describes how many GPUs to advertise on each node when the "gpu"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to pass this numDevices also to other profiles. Is there any reason to limit to gpu profile?

sb := gpu.ConfigSchemeBuilder
assert.NoError(t, sb.AddToScheme(configScheme))

s := httptest.NewServer(newMux(newConfigDecoder(), gpu.ValidateConfig, driverName))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting each profile to have its own ValidateConfig function? If so, we could define it as an abstract method on the profile struct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

demonstrating advanced features

3 participants