Skip to content

Conversation

@jakobmoellerdev
Copy link
Member

Add OwnerReferences between ResourceGraphDefinitions (RGDs) and generated CRDs to align with Kubernetes-native lifecycle management. Introduce a ValidatingAdmissionPolicy to prevent accidental RGD deletion and refine the allowCRDDeletion flag for clear controller-side cleanup behavior. Update documentation and tests accordingly.

Backing KREP for #749 and #745

…ourceGraphDefinitions

Add `OwnerReferences` between ResourceGraphDefinitions (RGDs) and generated CRDs to align with Kubernetes-native lifecycle management. Introduce a `ValidatingAdmissionPolicy` to prevent accidental RGD deletion and refine the `allowCRDDeletion` flag for clear controller-side cleanup behavior. Update documentation and tests accordingly.

Signed-off-by: Jakob Möller <[email protected]>
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 31, 2025
@k8s-ci-robot k8s-ci-robot requested a review from cheftako October 31, 2025 08:22
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jakobmoellerdev
Once this PR has been reviewed and has the lgtm label, please assign bridgetkromhout for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from matthchr October 31, 2025 08:22
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 31, 2025
Copy link
Contributor

@tjamet tjamet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 on aligning with standard k8s behaviours.

I would be in favour of moving forward with this proposal with 2 changes:

  • change the parameter from allowCRDDeletion to protectCRDDeletion or similar
  • [not mandatory] default behaviour is to delete "dangling" CRDs


1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD.
2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs.
3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the current implementation does not fully honor the flag?

What about making the default to true?

There's also a discussion to have about finalizers that plays a different role IMO (the best place is probably below in the discarded section)

Suggested change
3. **Controller deletion control**The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.
1. **Ownership introduction**CRDs now include an `OwnerReference` pointing to the creating RGD.
2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs (blocks deletion of RGDs with instances)
3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.


* When an RGD is deleted, its owned CRD is automatically garbage-collected. This is proper behavior
as we can still delete RGDs with policy "orphan" to preserve CRDs.
* Manual CRD deletion (while RGD still exists) will trigger re-reconciliation, as ownership implies controller responsibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Comment on lines +111 to +118
* When `allowCRDDeletion: false`:
KRO still sets owner references, but **does not itself delete CRDs**.
Deletion is handled by Kubernetes GC only when the RGD is removed.
The ValidatingAdmissionPolicy remains active to prevent accidental RGD deletion.

* When `allowCRDDeletion: true`:
KRO may directly delete CRDs it manages (e.g., during reconciliation or cleanup).
The Helm chart skips installation of the ValidatingAdmissionPolicy, allowing full manual control.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure I make sense of the semantics.
It feels weird to me to have allowCRDDeletion: false and that CRD would be deleted anyhow (by k8s).

There's also a case that could lead to inconsistent behaviour, when the RGD Kind, Group or APIVersion changes. In this case, when allowCRDDeletion: false we wouldn't delete the dangling CRD but the CRD would be deleted in case of RGD deletion.

The proposed behaviour and lifecycle sounds more like confirmCRDDeletion or requireManualJudgementForCRDDeletion
This semantics would also bring consistency when Kind, Group or APIVersion changes.
Maybe the validation webhook needs to be on the CRD itself then?

| Option | Reason Rejected |
| ------------------------------------ |-----------------------------------------------------------------------------------------|
| Continue using label linkage only | No lifecycle tracking or automatic cleanup, against k8s API semantics |
| Finalizer-based blocking | Overcomplicates reconciliation; redundant with OwnerReferences and foreground finalizer |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you develop on what makes reconciliation more complex and redundant with ownerreferences?

From kubernetes' documentation, finalizer seem to pursue this objective: hold a resource until all clean-up is done.

The target object remains in a terminating state while the control plane, or other components, take the actions defined by the finalizers

The only difference with the current approach is that the RGD would be marked for deletion and have to be entirely deleted before being re-applied, while the proposed approach would block deletion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. However, kubernetes sets its own foregroundDeletion as part of client based lifecycle handling: https://kubernetes.io/docs/concepts/architecture/garbage-collection/#foreground-deletion . The proposed approach effectively makes client deletion protected by a label and then allows users to control deletion behavior based on propagation policy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure how a label protects deletion.
It only applies to the KRO controller's own reconciliations, right?
I mean, it does not prevent anybody else from doing kubectl delete RGD or kubectl delete RGDInstance correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No thats not correct. If we use a policy this is on API server level so the deletion protection is enforced globally on the API, not just for the controller

This proposal introduces three coordinated changes:

1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD.
2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs.
Copy link
Contributor

@ellistarn ellistarn Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand protecting against the deletion of the Kinds managed by RGDs, but what's the thinking behind stopping a user that says k delete rgd foo from deleting the rgd foo. Delete means delete, right?

It strikes me that this could be something that organizations choose to opt in more broadly (deletion protection for any k8s resource), but odd that KRO would take the opinion for users when other projects don't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main reason is that by default kubernetes deletes with background propagation. if we delete a RGD, that would automatically start deletion of the CRD as well during the delete. I only added this after there was an opinion against simply allowing a delete. I personally favor this too but adjusted after community call feedback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, k8s standard leans towards no protection and delete everything, cascade.
Moreover, once marked for deletion, there is no way back, you must complete deletion before being able to re-use it again (kubectl delete starts by setting the deletionTimestamp that can't be undone)

As there is a clear 1-1 ownership link RGD <-> CRD I would indeed favour the standard behaviour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally favor this too but adjusted after community call feedback

I think this is the kind of thing where you can easily find voices in both directions. You'll see this debate surface through many AWS APIs, too. Just try deleting an S3 bucket with objects in it.

Given the controversy and the fact that it's broader than just our API, I'd just stick with the existing standard and let a separate solution handle deletion protection holistically.


#### 3. Controller-side deletion control (`allowCRDDeletion`)

Previously, the flag `allowCRDDeletion` determined whether KRO would delete CRDs it had created if the RGD was removed or changed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to explore the use cases for setting this flag one way or another. If we're certain that all of the instances managed by the RGD are gone, is it really dangerous to remove the Kind? I'm not seeing the upsides of leaking it if we're confident we're not losing data. I can see downsides of not removing the kind, such as attempting to do a fresh installation of a new version where the Kind has changed substantially.

@k8s-ci-robot
Copy link
Contributor

@jakobmoellerdev: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
presubmits-build-image f122a33 link true /test presubmits-build-image
presubmits-e2e-tests f122a33 link true /test presubmits-e2e-tests
presubmits-integration-tests f122a33 link true /test presubmits-integration-tests
presubmits-verify-lint f122a33 link true /test presubmits-verify-lint

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants