-
Notifications
You must be signed in to change notification settings - Fork 43
ManifestWorkReplicaSet Rollout Plugin #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ManifestWorkReplicaSet Rollout Plugin #160
Conversation
youngbupark
commented
Oct 28, 2025
- This is the initial draft for supporting rollout plugin in ManifestWorkReplicaSet Work Controller
- Note: rollback will be added when we propose MWRS automatic rollback enhancement.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: youngbupark The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| The following service defines the contract between Work Controller and the plugin. Each call must be idempotent, stateless, and time-bounded (≤30 s) to ensure consistent controller reconciliation. Plugin server must implement the following APIs. The helpers to implement server and clients will be implemented in [ocm/sdk-go](https://github.com/open-cluster-management-io/sdk-go) repository. | ||
|
|
||
| ```proto | ||
| // RolloutPluginService is the service for the rollout plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the initial commit of gRPC server proto - open-cluster-management-io/sdk-go#154
Note: The implementation can change as we develop.
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Outdated
Show resolved
Hide resolved
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Outdated
Show resolved
Hide resolved
| // RolloutPluginService is the service for the rollout plugin. | ||
| service RolloutPluginService { | ||
| // Initialize initializes the plugin. | ||
| rpc Initialize(InitializeRequest) returns (InitializeResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will need some clarification on error handling. What happens when a specific call fails? How would mwrs consumer to know and debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I will add error handling section.
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Outdated
Show resolved
Hide resolved
| // If the validation is completed successfully, the plugin should return a OK result. | ||
| // If the validation is still in progress, the plugin should return a INPROGRESS result. | ||
| // If the validation is failed, the plugin should return a FAILED result. | ||
| rpc ValidateRollout(RolloutPluginRequest) returns (ValidateResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will this be called in mwrs reconciler? I think a flow on when these APIs will be called in mwrs controller will be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc includes mermaid sequence diagram to describe how each hook will be called. Please check this out.
| // BeginRollout is called before the manifestwork resource is applied. | ||
| // It is used to prepare the rollout. | ||
| rpc BeginRollout(RolloutPluginRequest) returns (google.protobuf.Empty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean any spec change on manifestwork will trigger this? What if placement changes but mw spec does not change in mwrs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this BeginRollout will be called before creating manifestwork in each cluster (please see the mermaid sequence diagram) BeginRollout will be called whenever MWRS creates or update manifestwork in each cluster namespace.
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Show resolved
Hide resolved
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Outdated
Show resolved
Hide resolved
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Outdated
Show resolved
Hide resolved
| - rolloutStatus: The current [cluster rollout status](https://github.com/open-cluster-management-io/sdk-go/blob/main/pkg/apis/cluster/v1alpha1/rollout.go#L23-L39) (e.g., ToApply, Progressing, Succeeded, Failed, TimeOut, Skip). | ||
| - manifestRevisionName: The name of the manifest revision applied to the cluster. | ||
|
|
||
| ### Configure custom plugin for work controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once a plugin is enabled, what's the behavior of a normal mwrs rollout which does not need to call any plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once a plugin is enabled, what's the behavior of a normal mwrs rollout which does not need to call any plugin?
Plugin will be applied to all mwrs. @haoqing0110 do you think it is better to make it opt-in ?
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Outdated
Show resolved
Hide resolved
| workDriver: kube | ||
| # plugin configuration | ||
| plugins: | ||
| - name: my-rollout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiujian16 @haoqing0110 rather than using sidecar model, I agree to use standalone service. here is new cluster-manager resource model to register the plugin. new one supports multiple plugins and users can select its plugin if they need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
| // ProgressRollout is called after the manifestwork is applied. | ||
| // Whenever the feedbacks are updated, this method will be called. | ||
| // The plugin can execute the rollout logic based on the feedback status changes. | ||
| rpc ProgressRollout(RolloutPluginRequest) returns (google.protobuf.Empty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annelaucg I removed Rollback specific operation as we discussed yesterday. it is much simpler.
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Outdated
Show resolved
Hide resolved
| workDriver: kube | ||
| # plugin configuration | ||
| plugins: | ||
| - name: my-rollout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Outdated
Show resolved
Hide resolved
| # optional. secretRef is the reference for ca | ||
| secretRef: | ||
| name: my-rollout-ca | ||
| namespace: default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace might not be needed. The secret has to be put in the open-cluster-management-hub ns.
| Work->>Work: Create rollout handler | ||
| Work->>Work: Find rollout/removed/timeout candidate clusters (RolloutResult) | ||
| alt timeout clusters exists and .spec.placementRefs[*].rolloutStrategy.abortOnFailure is true | ||
| Note over Work, PluginServer: Start automatic abort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to set abort when Degraded or ValidateFailed.
So this will also need to be set during the ProgressRollout func and the ValidateRolloutFunc is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this will be set by MWRS controller. Please see the proposal in #164
| rolloutStrategy: | ||
| type: Progressive | ||
| # plugin is optional. | ||
| plugin: my-rollout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need any failsafe if the plugin that the user adds causes issues? Or is that dependent on the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate on failsafe ? In general, user should be able to recognize it. using plugin is opt-in feature. MWRS should show the error in status if Plugin is throwing error or unavailable. It might be better to stop the reconciler loop rather than self-resolving the problem.
enhancements/sig-architecture/227-manifestworkreplicaset-rollout-plugin/README.md
Outdated
Show resolved
Hide resolved
| namespace: {{ .ClusterManagerNamespace }} | ||
| data: | ||
| config.yaml: | | ||
| plugins: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiujian16 can you please review configmap? I wonder if we need to define plugins as separate plugin config file.
| #### Error handling | ||
|
|
||
| gRPC status codes follow the [standard gRPC status codes](https://grpc.github.io/grpc/core/md_doc_statuscodes.html): 0 = OK, 1 = CANCELLED, 2 = UNKNOWN, 3 = INVALID_ARGUMENT, 4 = DEADLINE_EXCEEDED, etc. Work controller will also utilize the [standard gRPC retry](https://grpc.io/docs/guides/retry/) for `UNAVAILABLE` status code. | ||
| When work controller fails to call plugin APIs, the failure reason from plugin server is shown in `PluginLoaded` status condition message for debugging purpose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiujian16 Generally I would like to show the error message in PluginLoaded status condition, but for the errors while executing rollout, Progressing status condition might be the right place rather than PluginLoadded cc/ @annelaucg @qiujian16