Skip to content

Conversation

@mdbooth
Copy link
Contributor

@mdbooth mdbooth commented Feb 13, 2025

This change uses Server-Side Apply to apply provider manifests in place of the logic from library-go, which uses an Update with custom kind-specific client-side merge logic. Using SSA here should be equivalent to how these manifests would be applied by any other client.

The merge logic from library-go should no longer be required as long as the manifests are not specifying any fields which will be overwritten by an existing controller.

In particular, there should be no conflict with service-ca setting caBundle in various places, as long as the specified manifests do not include a caBundle. However, note that due to a validation bug in older versions of k8s, some CRDs do still specify an empty caBundle in their CRDs. These would have to be removed for this to work.

The expected flow of reconciles between cluster-capi-operator and service-ca-operator becomes:

  • cluster-capi-operator: apply initial manifests
  • service-ca-operator: adds caBundle to CRDs and validating webhooks
  • cluster-capi-operator: triggered by update to manifests it managed applies manifests again

The final update will produce no change as long as no managed fields have been updated, so there will be no further reconciles.

However, objects with the provider label were being reconciled too often due to insufficient filtering. This PR includes a second commit which addresses that.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign theobarberbany for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mdbooth
Copy link
Contributor Author

mdbooth commented Feb 13, 2025

/test

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2025

@mdbooth: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test build
/test e2e-aws-capi-customnoupgrade-migration
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial
/test e2e-aws-ovn-techpreview
/test e2e-azure-capi-techpreview
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-vsphere-capi-techpreview
/test images
/test lint
/test okd-scos-images
/test unit
/test vendor

The following commands are available to trigger optional jobs:

/test e2e-azure-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test okd-scos-e2e-aws-ovn
/test regression-clusterinfra-cucushift-rehearse-capi-aws-ipi
/test security

Use /test all to run all jobs.

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mdbooth
Copy link
Contributor Author

mdbooth commented Feb 13, 2025

/test all

@mdbooth
Copy link
Contributor Author

mdbooth commented Feb 13, 2025

Interesting observation: with this change kube-apiserver now reports that it's reloading all the provider CRDs every 5 minutes. However, it was also doing that before.

I think this might be good, and there's also another issue.

@mdbooth
Copy link
Contributor Author

mdbooth commented Feb 13, 2025

Looks like everything interesting passed. I'm pushing an update which:

  • Fixes lints
  • Adds a proper field owner for the SSA operation
  • Adds a second commit which should dramatically reduce the number of reconciles

@mdbooth mdbooth marked this pull request as ready for review February 13, 2025 19:31
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2025
@openshift-ci openshift-ci bot requested review from nrb and racheljpg February 13, 2025 19:32
@mdbooth
Copy link
Contributor Author

mdbooth commented Feb 13, 2025

That last commit reduced the number of reconciles from 67 without this PR to 10.

However:

  • That's still 8 more than I'd expect
  • With only the first commit there were 1498 reconciles, which I can't explain

Still think this is an improvement.

@mdbooth
Copy link
Contributor Author

mdbooth commented Feb 13, 2025

That last commit is probably too much for this PR. I'll split it into a separate PR if you want it.

Because we're now iterating directly over objects, it's trivial to add an owner reference to every object that we create. This allows us to be much more explicit in the objects we watch. It's also a more obvious fit for the behaviour of this controller: the controller doesn't really manage objects which have a capi provider label; it manages objects it created, whatever label they have.

It also gives us a way to implement the automatic removal of objects which are no longer referenced in a provider ConfigMap. This actually came up in CAPO when we removed our MutatingWebhook! Clusterctl implements this correctly, but users who were doing it manually had failed installations after upgrade because there was a floating MutatingWebhook referring to an endpoint which no longer existed.

@mdbooth mdbooth force-pushed the apply-with-ssa branch 2 times, most recently from 8a4a39f to bb5ac06 Compare February 14, 2025 09:20
We were unconditionally triggering reconciles for all modifications to
any managed object. This was producing a large number of unnecessary
reconciles when the status of managed objects was updated.
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 15, 2025

@mdbooth: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial 41daec5 link true /test e2e-aws-ovn-serial
ci/prow/e2e-openstack-ovn-techpreview 41daec5 link true /test e2e-openstack-ovn-techpreview

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mdbooth
Copy link
Contributor Author

mdbooth commented May 12, 2025

Replaced by #292 as I no longer have write access to this fork.

@mdbooth mdbooth closed this May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants