-
Notifications
You must be signed in to change notification settings - Fork 101
Closed
Milestone
Description
Hello,
After deploying an up-to-date version of the driver through Helm, I encountered this failure when trying to start a pod that uses an MPS claim:
Warning FailedPrepareDynamicResources 4s kubelet Failed to prepare dynamic resources: NodePrepareResources failed for claim lab/mps-gpu-7bbd549f7b-z2vgr-mps-gpus-k6kf5: error preparing devices for claim 74689d61-7a6d-43d4-aa60-29c93c7ab7ea: prepare devices failed: error applying GPU config: error starting MPS control daemon: error checking if control daemon already started: failed to get deployment: deployments.apps "mps-control-daemon-74689d61-7a6d-43d4-aa60-29c93c7ab7ea-44f48" is forbidden: User "system:serviceaccount:nvidia-dra:nvidia-dra-k8s-dra-driver-service-account" cannot get resource "deployments" in API group "apps" in the namespace "nvidia-dra"
I did not experience this a few weeks ago with an identical setup, so after checking the most recent changes in the Helm template, I found that the ClusterRole has been modified by 4253b44 (part of #219 ), in a way that prevents the ServiceAccount to manage Deployments.
If I revert this change and update the ClusterRole, everything works as it did before.
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Closed