Apply provider manifests with SSA #292

mdbooth · 2025-05-12T14:43:24Z

This change uses Server-Side Apply to apply provider manifests in place of the logic from library-go, which uses an Update with custom kind-specific client-side merge logic. Using SSA here should be equivalent to how these manifests would be applied by any other client.

The merge logic from library-go should no longer be required as long as the manifests are not specifying any fields which will be overwritten by an existing controller.

In particular, there should be no conflict with service-ca setting caBundle in various places, as long as the specified manifests do not include a caBundle. However, note that due to a validation bug in older versions of k8s, some CRDs do still specify an empty caBundle in their CRDs. These would have to be removed for this to work.

The expected flow of reconciles between cluster-capi-operator and service-ca-operator becomes:

cluster-capi-operator: apply initial manifests
service-ca-operator: adds caBundle to CRDs and validating webhooks
cluster-capi-operator: triggered by update to manifests it managed applies manifests again
The final update will produce no change as long as no managed fields have been updated, so there will be no further reconciles.

However, objects with the provider label were being reconciled too often due to insufficient filtering. This PR includes a second commit which addresses that.

We were unconditionally triggering reconciles for all modifications to any managed object. This was producing a large number of unnecessary reconciles when the status of managed objects was updated.

openshift-ci · 2025-05-12T14:46:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign theobarberbany for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

damdo · 2025-05-12T16:25:22Z

I'll try and review this one in the coming days. Also as a domain expert it'd probably be good to have a review from @JoelSpeed

mdbooth · 2025-05-12T16:36:24Z

pkg/controllers/capiinstaller/capi_installer_controller.go

+		// We reconcile all Deployment changes because we intend to reflect the
+		// status of any created Deployment in the ClusterOperator status.
+		Watches(
+			&appsv1.Deployment{},
+			handler.EnqueueRequestsFromMapFunc(toClusterOperator),
+			builder.WithPredicates(ownedPlatformLabelPredicate(r.ManagedNamespace, r.Platform)),


I remember when we discussed this we decided to leave this in because of functionality we intend to implement in the future. Honestly I'd leave it out and add it back when we add something which needs it. Until then it's just adding noise.

Yes. See my comment above re:
TODO: Deployments State/Conditions should influence the overall ClusterOperator Status.

I think ideally we should be looking at doing this considering the upcoming GAing of the operator.

Maybe something we can tackle in tandem with the un-revert of: #273

In this PR we should at least see if we are introducing anything that would prevent us easily doing that.

mdbooth · 2025-05-12T16:39:30Z

pkg/controllers/capiinstaller/capi_installer_controller.go

-		if r.Error != nil {
-			errs = errors.Join(errs, fmt.Errorf("error applying CAPI provider component %q at position %d: %w", r.File, i, r.Error))
+	for i, providerObject := range providerObjects {
+		err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer"))


bike shedding opportunity: what to call this field owner? I think this is good, but I'm just calling it out.

Should probably move the field owner to a constant somewhere.

In other places we used controllerName + suffix

cluster-capi-operator/pkg/controllers/machinemigration/machine_migration_controller.go

Lines 364 to 366 in 39748f2

if err := r.Status().Patch(ctx, m, util.ApplyConfigPatch(ac), client.ForceOwnership, client.FieldOwner(controllerName+"-AuthoritativeAPI")); err != nil {

return fmt.Errorf("failed to patch Machine API machine status with authoritativeAPI %q: %w", authority, err)

}

for these

mdbooth · 2025-05-12T16:49:01Z

/test e2e-openstack-capi-techpreview

damdo · 2025-05-14T13:29:12Z

pkg/controllers/capiinstaller/watch_predicates.go

-		UpdateFunc:  func(e event.UpdateEvent) bool { return isClusterOperator(e.ObjectNew) },
-		GenericFunc: func(e event.GenericEvent) bool { return isClusterOperator(e.Object) },
-		DeleteFunc:  func(e event.DeleteEvent) bool { return isClusterOperator(e.Object) },
+		CreateFunc: func(e event.CreateEvent) bool { return isClusterOperator(e.Object) },


Genuine question Is there anything in the ClusterOperator we might be interested in reconciling at the moment?
Like anything in the status/condititions?

damdo · 2025-05-14T13:31:25Z

pkg/controllers/capiinstaller/capi_installer_controller.go

-		if r.Error != nil {
-			errs = errors.Join(errs, fmt.Errorf("error applying CAPI provider component %q at position %d: %w", r.File, i, r.Error))
+	for i, providerObject := range providerObjects {
+		err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer"))


In other places we used controllerName + suffix

cluster-capi-operator/pkg/controllers/machinemigration/machine_migration_controller.go

Lines 364 to 366 in 39748f2

if err := r.Status().Patch(ctx, m, util.ApplyConfigPatch(ac), client.ForceOwnership, client.FieldOwner(controllerName+"-AuthoritativeAPI")); err != nil {

return fmt.Errorf("failed to patch Machine API machine status with authoritativeAPI %q: %w", authority, err)

}

for these

damdo · 2025-05-15T09:58:11Z

pkg/controllers/capiinstaller/capi_installer_controller.go

-			return fmt.Errorf("error parsing CAPI provider deployment manifets %q: %w", d, err)
-		}
-
-		// TODO: Deployments State/Conditions should influence the overall ClusterOperator Status.


This is still something we should be looking at doing. At the moment we are blind on this front, if the core CAPI/provider CAPI Deployments fail/are not Available and/or Ready, we are still reporting we are Available and not Degraded.
Ideally we should keep an eye on them (we already set up watches for these), and set Degraded/Not Degraded (?) accordingly (i.e. setDegradedCondition())

damdo · 2025-05-15T10:01:32Z

pkg/controllers/capiinstaller/capi_installer_controller.go

+		// We reconcile all Deployment changes because we intend to reflect the
+		// status of any created Deployment in the ClusterOperator status.
+		Watches(
+			&appsv1.Deployment{},
+			handler.EnqueueRequestsFromMapFunc(toClusterOperator),
+			builder.WithPredicates(ownedPlatformLabelPredicate(r.ManagedNamespace, r.Platform)),


Yes. See my comment above re:
TODO: Deployments State/Conditions should influence the overall ClusterOperator Status.

I think ideally we should be looking at doing this considering the upcoming GAing of the operator.

Maybe something we can tackle in tandem with the un-revert of: #273

In this PR we should at least see if we are introducing anything that would prevent us easily doing that.

damdo · 2025-05-15T10:05:49Z

pkg/controllers/capiinstaller/capi_installer_controller.go

-		if r.Error != nil {
-			errs = errors.Join(errs, fmt.Errorf("error applying CAPI provider component %q at position %d: %w", r.File, i, r.Error))
+	for i, providerObject := range providerObjects {
+		err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer"))


Is there any obvious difference that we need to be aware of/account for, between r.Patch & client.Apply + SSA vs resourceapply.ApplyDeployment?

I had the same question

I looked at it at the time, and convinced myself the answer is no. I could do that again...

However, intuitively this is fine. Consider that this is how all client tooling applies this. Deployments do not require special consideration. Perhaps they once did.

JoelSpeed · 2025-05-15T14:49:18Z

pkg/controllers/capiinstaller/capi_installer_controller.go

-		if r.Error != nil {
-			errs = errors.Join(errs, fmt.Errorf("error applying CAPI provider component %q at position %d: %w", r.File, i, r.Error))
+	for i, providerObject := range providerObjects {
+		err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer"))


I had the same question

JoelSpeed · 2025-05-15T14:50:03Z

pkg/controllers/capiinstaller/capi_installer_controller.go

+	for i, providerObject := range providerObjects {
+		err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer"))
+		if err != nil {
+			gvk := providerObject.GroupVersionKind()


Why not use the GVK string function and then join that to the name?

JoelSpeed · 2025-05-15T14:51:17Z

pkg/controllers/capiinstaller/capi_installer_controller.go

+		err := r.Patch(ctx, providerObject, client.Apply, client.ForceOwnership, client.FieldOwner("cluster-capi-operator.openshift.io/installer"))
+		if err != nil {
+			gvk := providerObject.GroupVersionKind()
+			name := strings.Join([]string{gvk.Group, gvk.Version, gvk.Kind, providerObject.GetName()}, "/")


Does the namespace matter?

JoelSpeed · 2025-05-15T14:52:05Z

pkg/controllers/capiinstaller/watch_predicates.go

+	// We only want to be reconciled on creation of the cluster operator,
+	// because we wait for it before reconciling. The Create event also fires
+	// when the manager is started, so this will additionally ensure we are
+	// called at least once at startup.


I thought it was a generic event triggered by the List? Not create?

openshift-ci · 2025-07-10T15:25:15Z

@mdbooth: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn	`37d52c9`	link	true	`/test e2e-aws-ovn`
ci/prow/okd-scos-e2e-aws-ovn	`37d52c9`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-azure-ovn-techpreview	`37d52c9`	link	false	`/test e2e-azure-ovn-techpreview`
ci/prow/e2e-aws-ovn-techpreview-upgrade	`37d52c9`	link	true	`/test e2e-aws-ovn-techpreview-upgrade`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-bot · 2025-10-09T01:00:29Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

damdo · 2025-10-09T14:46:31Z

/remove-lifecycle stale

mdbooth added 2 commits May 12, 2025 14:58

Apply provider manifests with SSA

c0ee7eb

Only reconcile capi_installer for spec changes

37d52c9

We were unconditionally triggering reconciles for all modifications to any managed object. This was producing a large number of unnecessary reconciles when the status of managed objects was updated.

mdbooth mentioned this pull request May 12, 2025

Apply provider manifests with SSA #259

Closed

openshift-ci bot requested review from RadekManak and sub-mod May 12, 2025 14:45

mdbooth mentioned this pull request May 12, 2025

cluster-capi-operator owns the objects it loads from manifests #261

Closed

1 task

mdbooth commented May 12, 2025

View reviewed changes

mdbooth mentioned this pull request May 12, 2025

cluster-capi-operator owns the objects it loads from manifests #293

Open

1 task

damdo reviewed May 15, 2025

View reviewed changes

JoelSpeed reviewed May 15, 2025

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2025

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2025

	if err := r.Status().Patch(ctx, m, util.ApplyConfigPatch(ac), client.ForceOwnership, client.FieldOwner(controllerName+"-AuthoritativeAPI")); err != nil {
	return fmt.Errorf("failed to patch Machine API machine status with authoritativeAPI %q: %w", authority, err)
	}

Apply provider manifests with SSA #292

Are you sure you want to change the base?

Apply provider manifests with SSA #292

Uh oh!

Conversation

mdbooth commented May 12, 2025

Uh oh!

openshift-ci bot commented May 12, 2025

Uh oh!

damdo commented May 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdbooth commented May 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jul 10, 2025

Uh oh!

openshift-bot commented Oct 9, 2025

Uh oh!

damdo commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants