-
Notifications
You must be signed in to change notification settings - Fork 49
Apply provider manifests with SSA #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
We were unconditionally triggering reconciles for all modifications to any managed object. This was producing a large number of unnecessary reconciles when the status of managed objects was updated.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
I'll try and review this one in the coming days. Also as a domain expert it'd probably be good to have a review from @JoelSpeed |
| // We reconcile all Deployment changes because we intend to reflect the | ||
| // status of any created Deployment in the ClusterOperator status. | ||
| Watches( | ||
| &appsv1.Deployment{}, | ||
| handler.EnqueueRequestsFromMapFunc(toClusterOperator), | ||
| builder.WithPredicates(ownedPlatformLabelPredicate(r.ManagedNamespace, r.Platform)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember when we discussed this we decided to leave this in because of functionality we intend to implement in the future. Honestly I'd leave it out and add it back when we add something which needs it. Until then it's just adding noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. See my comment above re:
TODO: Deployments State/Conditions should influence the overall ClusterOperator Status.
I think ideally we should be looking at doing this considering the upcoming GAing of the operator.
Maybe something we can tackle in tandem with the un-revert of: #273
In this PR we should at least see if we are introducing anything that would prevent us easily doing that.
| if r.Error != nil { | ||
| errs = errors.Join(errs, fmt.Errorf("error applying CAPI provider component %q at position %d: %w", r.File, i, r.Error)) | ||
| for i, providerObject := range providerObjects { | ||
| err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bike shedding opportunity: what to call this field owner? I think this is good, but I'm just calling it out.
Should probably move the field owner to a constant somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places we used controllerName + suffix
cluster-capi-operator/pkg/controllers/machinemigration/machine_migration_controller.go
Lines 364 to 366 in 39748f2
| if err := r.Status().Patch(ctx, m, util.ApplyConfigPatch(ac), client.ForceOwnership, client.FieldOwner(controllerName+"-AuthoritativeAPI")); err != nil { | |
| return fmt.Errorf("failed to patch Machine API machine status with authoritativeAPI %q: %w", authority, err) | |
| } |
|
/test e2e-openstack-capi-techpreview |
| UpdateFunc: func(e event.UpdateEvent) bool { return isClusterOperator(e.ObjectNew) }, | ||
| GenericFunc: func(e event.GenericEvent) bool { return isClusterOperator(e.Object) }, | ||
| DeleteFunc: func(e event.DeleteEvent) bool { return isClusterOperator(e.Object) }, | ||
| CreateFunc: func(e event.CreateEvent) bool { return isClusterOperator(e.Object) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Genuine question Is there anything in the ClusterOperator we might be interested in reconciling at the moment?
Like anything in the status/condititions?
| if r.Error != nil { | ||
| errs = errors.Join(errs, fmt.Errorf("error applying CAPI provider component %q at position %d: %w", r.File, i, r.Error)) | ||
| for i, providerObject := range providerObjects { | ||
| err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places we used controllerName + suffix
cluster-capi-operator/pkg/controllers/machinemigration/machine_migration_controller.go
Lines 364 to 366 in 39748f2
| if err := r.Status().Patch(ctx, m, util.ApplyConfigPatch(ac), client.ForceOwnership, client.FieldOwner(controllerName+"-AuthoritativeAPI")); err != nil { | |
| return fmt.Errorf("failed to patch Machine API machine status with authoritativeAPI %q: %w", authority, err) | |
| } |
| return fmt.Errorf("error parsing CAPI provider deployment manifets %q: %w", d, err) | ||
| } | ||
|
|
||
| // TODO: Deployments State/Conditions should influence the overall ClusterOperator Status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still something we should be looking at doing. At the moment we are blind on this front, if the core CAPI/provider CAPI Deployments fail/are not Available and/or Ready, we are still reporting we are Available and not Degraded.
Ideally we should keep an eye on them (we already set up watches for these), and set Degraded/Not Degraded (?) accordingly (i.e. setDegradedCondition())
| // We reconcile all Deployment changes because we intend to reflect the | ||
| // status of any created Deployment in the ClusterOperator status. | ||
| Watches( | ||
| &appsv1.Deployment{}, | ||
| handler.EnqueueRequestsFromMapFunc(toClusterOperator), | ||
| builder.WithPredicates(ownedPlatformLabelPredicate(r.ManagedNamespace, r.Platform)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. See my comment above re:
TODO: Deployments State/Conditions should influence the overall ClusterOperator Status.
I think ideally we should be looking at doing this considering the upcoming GAing of the operator.
Maybe something we can tackle in tandem with the un-revert of: #273
In this PR we should at least see if we are introducing anything that would prevent us easily doing that.
| if r.Error != nil { | ||
| errs = errors.Join(errs, fmt.Errorf("error applying CAPI provider component %q at position %d: %w", r.File, i, r.Error)) | ||
| for i, providerObject := range providerObjects { | ||
| err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any obvious difference that we need to be aware of/account for, between r.Patch & client.Apply + SSA vs resourceapply.ApplyDeployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it at the time, and convinced myself the answer is no. I could do that again...
However, intuitively this is fine. Consider that this is how all client tooling applies this. Deployments do not require special consideration. Perhaps they once did.
| if r.Error != nil { | ||
| errs = errors.Join(errs, fmt.Errorf("error applying CAPI provider component %q at position %d: %w", r.File, i, r.Error)) | ||
| for i, providerObject := range providerObjects { | ||
| err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question
| for i, providerObject := range providerObjects { | ||
| err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer")) | ||
| if err != nil { | ||
| gvk := providerObject.GroupVersionKind() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the GVK string function and then join that to the name?
| err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer")) | ||
| if err != nil { | ||
| gvk := providerObject.GroupVersionKind() | ||
| name := strings.Join([]string{gvk.Group, gvk.Version, gvk.Kind, providerObject.GetName()}, "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the namespace matter?
| // We only want to be reconciled on creation of the cluster operator, | ||
| // because we wait for it before reconciling. The Create event also fires | ||
| // when the manager is started, so this will additionally ensure we are | ||
| // called at least once at startup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was a generic event triggered by the List? Not create?
|
@mdbooth: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
This change uses Server-Side Apply to apply provider manifests in place of the logic from library-go, which uses an Update with custom kind-specific client-side merge logic. Using SSA here should be equivalent to how these manifests would be applied by any other client.
The merge logic from library-go should no longer be required as long as the manifests are not specifying any fields which will be overwritten by an existing controller.
In particular, there should be no conflict with service-ca setting caBundle in various places, as long as the specified manifests do not include a caBundle. However, note that due to a validation bug in older versions of k8s, some CRDs do still specify an empty caBundle in their CRDs. These would have to be removed for this to work.
The expected flow of reconciles between cluster-capi-operator and service-ca-operator becomes:
However, objects with the provider label were being reconciled too often due to insufficient filtering. This PR includes a second commit which addresses that.