Regression on MPS setups following changes in clusterrole

Hello,

After deploying an up-to-date version of the driver through Helm, I encountered this failure when trying to start a pod that uses an MPS claim:

```
  Warning  FailedPrepareDynamicResources  4s    kubelet            Failed to prepare dynamic resources: NodePrepareResources failed for claim lab/mps-gpu-7bbd549f7b-z2vgr-mps-gpus-k6kf5: error preparing devices for claim 74689d61-7a6d-43d4-aa60-29c93c7ab7ea: prepare devices failed: error applying GPU config: error starting MPS control daemon: error checking if control daemon already started: failed to get deployment: deployments.apps "mps-control-daemon-74689d61-7a6d-43d4-aa60-29c93c7ab7ea-44f48" is forbidden: User "system:serviceaccount:nvidia-dra:nvidia-dra-k8s-dra-driver-service-account" cannot get resource "deployments" in API group "apps" in the namespace "nvidia-dra"
```

I did not experience this a few weeks ago with an identical setup, so after checking the most recent changes in the Helm template, I found that the ClusterRole has been modified by 4253b44b0d87cc1847d34eaafc8f78b23349698c (part of #219 ), in a way that prevents the ServiceAccount to manage Deployments.

If I revert this change and update the ClusterRole, everything works as it did before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regression on MPS setups following changes in clusterrole #229

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Regression on MPS setups following changes in clusterrole #229

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions