Skip to content

Conversation

@jfroy
Copy link

@jfroy jfroy commented Sep 23, 2024

This patch adds support for the DevicePluginCDIDevices feature gate by adding spec.operator.useDevicePluginCDIDevicesFeature to ClusterPolicy. When this field is set, the operator sets the DEVICE_LIST_STRATEGY device plug-in environment variable to cdi-cri.

@jfroy jfroy force-pushed the deviceplugincdidevices branch 2 times, most recently from 435a38f to 0b0151d Compare September 23, 2024 14:58
@jfroy jfroy changed the title Support for the DevicePluginCDIDevices feature Support the DevicePluginCDIDevices feature gate Sep 23, 2024
@jfroy
Copy link
Author

jfroy commented Sep 23, 2024

@cdesiniotis @elezar

// +operator-sdk:gen-csv:customresourcedefinitions.specDescriptors.x-descriptors="urn:alm:descriptor:com.tectonic.ui:booleanSwitch"
UseOpenShiftDriverToolkit *bool `json:"use_ocp_driver_toolkit,omitempty"`

// UseDevicePluginCDIDevicesFeature indicates if the device plug-in should be configured to use the CDI devices feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question on the UX for this: Should this be under the cdi object in the cluster policy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an option that only affects the operator, but is related to CDI. I don't know to which it binds more strongly. So I picked one. It's easy enough to change that in the PR, but of course hard after. Maybe other folks can chime in. Ultimately I will defer to you and your team.

} else {
setContainerEnv(&(obj.Spec.Template.Spec.Containers[0]), DeviceListStrategyEnvName, "envvar,cdi-annotations")
}
setContainerEnv(&(obj.Spec.Template.Spec.Containers[0]), CDIAnnotationPrefixEnvName, "nvidia.cdi.k8s.io/")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is also not relevant when using cdi-cri.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to say that useDevicePluginCDIDevicesFeature should be stronger than cdi.enabled? The current design and implementation is that useDevicePluginCDIDevicesFeature does nothing unless cdi.enabled is set.

This patch adds support for the `DevicePluginCDIDevices` feature gate by
adding `spec.operator.useDevicePluginCDIDevicesFeature` to
`ClusterPolicy`.  When this field is set, the operator sets the
`DEVICE_LIST_STRATEGY` device plug-in environment variable to `cdi-cri`.

Signed-off-by: Jean-Francois Roy <[email protected]>
@jfroy jfroy force-pushed the deviceplugincdidevices branch from 0b0151d to ef79ad3 Compare November 6, 2024 22:10
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 6, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jfroy
Copy link
Author

jfroy commented Feb 19, 2025

@cdesiniotis I imagine #1285 obsoletes this PR?

@jfroy
Copy link
Author

jfroy commented Feb 19, 2025

@cdesiniotis I imagine #1285 obsoletes this PR?

Ah wait, no, it doesn't use "native CDI" but instead relies on annotations. I'll comment more in the internal document about this.

@cdesiniotis
Copy link
Contributor

@jfroy I've updated #1285 to use the CRI instead of annotations, so yes, if we proceed with #1285 it will obsolete this PR. See the discussion here: #1285 (comment)

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2025
@jfroy
Copy link
Author

jfroy commented Nov 18, 2025

This is no longer needed. 🚀

@jfroy jfroy closed this Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants