KREP-004: OwnerReferences and deletion policy for ResourceGraphDefinitions #763

jakobmoellerdev · 2025-10-31T08:22:06Z

Add OwnerReferences between ResourceGraphDefinitions (RGDs) and generated CRDs to align with Kubernetes-native lifecycle management. Introduce a ValidatingAdmissionPolicy to prevent accidental RGD deletion and refine the allowCRDDeletion flag for clear controller-side cleanup behavior. Update documentation and tests accordingly.

Backing KREP for #749 and #745

…ourceGraphDefinitions Add `OwnerReferences` between ResourceGraphDefinitions (RGDs) and generated CRDs to align with Kubernetes-native lifecycle management. Introduce a `ValidatingAdmissionPolicy` to prevent accidental RGD deletion and refine the `allowCRDDeletion` flag for clear controller-side cleanup behavior. Update documentation and tests accordingly. Signed-off-by: Jakob Möller <[email protected]>

k8s-ci-robot · 2025-10-31T08:22:09Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot · 2025-10-31T08:22:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jakobmoellerdev
Once this PR has been reviewed and has the lgtm label, please assign bridgetkromhout for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tjamet

💯 on aligning with standard k8s behaviours.

I would be in favour of moving forward with this proposal with 2 changes:

change the parameter from allowCRDDeletion to protectCRDDeletion or similar
[not mandatory] default behaviour is to delete "dangling" CRDs

tjamet · 2025-10-31T12:07:27Z

docs/design/proposals/rgd-ownership-and-deletion.md

+
+1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD.
+2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs.
+3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.


Does this mean the current implementation does not fully honor the flag?

What about making the default to true?

There's also a discussion to have about finalizers that plays a different role IMO (the best place is probably below in the discarded section)

Suggested change

3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.

1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD.

2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs (blocks deletion of RGDs with instances)

3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation.

tjamet · 2025-10-31T12:08:35Z

docs/design/proposals/rgd-ownership-and-deletion.md

+
+* When an RGD is deleted, its owned CRD is automatically garbage-collected. This is proper behavior
+  as we can still delete RGDs with policy "orphan" to preserve CRDs.
+* Manual CRD deletion (while RGD still exists) will trigger re-reconciliation, as ownership implies controller responsibility.


tjamet · 2025-10-31T12:40:55Z

docs/design/proposals/rgd-ownership-and-deletion.md

+* When `allowCRDDeletion: false`:
+  KRO still sets owner references, but **does not itself delete CRDs**.
+  Deletion is handled by Kubernetes GC only when the RGD is removed.
+  The ValidatingAdmissionPolicy remains active to prevent accidental RGD deletion.
+
+* When `allowCRDDeletion: true`:
+  KRO may directly delete CRDs it manages (e.g., during reconciliation or cleanup).
+  The Helm chart skips installation of the ValidatingAdmissionPolicy, allowing full manual control.


I'm unsure I make sense of the semantics.
It feels weird to me to have allowCRDDeletion: false and that CRD would be deleted anyhow (by k8s).

There's also a case that could lead to inconsistent behaviour, when the RGD Kind, Group or APIVersion changes. In this case, when allowCRDDeletion: false we wouldn't delete the dangling CRD but the CRD would be deleted in case of RGD deletion.

The proposed behaviour and lifecycle sounds more like confirmCRDDeletion or requireManualJudgementForCRDDeletion
This semantics would also bring consistency when Kind, Group or APIVersion changes.
Maybe the validation webhook needs to be on the CRD itself then?

tjamet · 2025-10-31T13:03:20Z

docs/design/proposals/rgd-ownership-and-deletion.md

+| Option                               | Reason Rejected                                                                         |
+| ------------------------------------ |-----------------------------------------------------------------------------------------|
+| Continue using label linkage only    | No lifecycle tracking or automatic cleanup, against k8s API semantics                   |
+| Finalizer-based blocking             | Overcomplicates reconciliation; redundant with OwnerReferences and foreground finalizer |


Can you develop on what makes reconciliation more complex and redundant with ownerreferences?

From kubernetes' documentation, finalizer seem to pursue this objective: hold a resource until all clean-up is done.

The target object remains in a terminating state while the control plane, or other components, take the actions defined by the finalizers

The only difference with the current approach is that the RGD would be marked for deletion and have to be entirely deleted before being re-applied, while the proposed approach would block deletion

That is correct. However, kubernetes sets its own foregroundDeletion as part of client based lifecycle handling: https://kubernetes.io/docs/concepts/architecture/garbage-collection/#foreground-deletion . The proposed approach effectively makes client deletion protected by a label and then allows users to control deletion behavior based on propagation policy.

I'm unsure how a label protects deletion.
It only applies to the KRO controller's own reconciliations, right?
I mean, it does not prevent anybody else from doing kubectl delete RGD or kubectl delete RGDInstance correct?

No thats not correct. If we use a policy this is on API server level so the deletion protection is enforced globally on the API, not just for the controller

ellistarn · 2025-11-13T03:43:30Z

docs/design/proposals/rgd-ownership-and-deletion.md

+This proposal introduces three coordinated changes:
+
+1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD.
+2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs.


I can understand protecting against the deletion of the Kinds managed by RGDs, but what's the thinking behind stopping a user that says k delete rgd foo from deleting the rgd foo. Delete means delete, right?

It strikes me that this could be something that organizations choose to opt in more broadly (deletion protection for any k8s resource), but odd that KRO would take the opinion for users when other projects don't.

the main reason is that by default kubernetes deletes with background propagation. if we delete a RGD, that would automatically start deletion of the CRD as well during the delete. I only added this after there was an opinion against simply allowing a delete. I personally favor this too but adjusted after community call feedback

Indeed, k8s standard leans towards no protection and delete everything, cascade.
Moreover, once marked for deletion, there is no way back, you must complete deletion before being able to re-use it again (kubectl delete starts by setting the deletionTimestamp that can't be undone)

As there is a clear 1-1 ownership link RGD <-> CRD I would indeed favour the standard behaviour

I personally favor this too but adjusted after community call feedback

I think this is the kind of thing where you can easily find voices in both directions. You'll see this debate surface through many AWS APIs, too. Just try deleting an S3 bucket with objects in it.

Given the controversy and the fact that it's broader than just our API, I'd just stick with the existing standard and let a separate solution handle deletion protection holistically.

ellistarn · 2025-11-13T03:57:50Z

docs/design/proposals/rgd-ownership-and-deletion.md

+
+#### 3. Controller-side deletion control (`allowCRDDeletion`)
+
+Previously, the flag `allowCRDDeletion` determined whether KRO would delete CRDs it had created if the RGD was removed or changed.


I'm curious to explore the use cases for setting this flag one way or another. If we're certain that all of the instances managed by the RGD are gone, is it really dangerous to remove the Kind? I'm not seeing the upsides of leaking it if we're confident we're not losing data. I can see downsides of not removing the kind, such as attempting to do a fresh installation of a new version where the Kind has changed substantially.

k8s-ci-robot · 2025-11-14T00:25:39Z

@jakobmoellerdev: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
presubmits-build-image	`f122a33`	link	true	`/test presubmits-build-image`
presubmits-e2e-tests	`f122a33`	link	true	`/test presubmits-e2e-tests`
presubmits-integration-tests	`f122a33`	link	true	`/test presubmits-integration-tests`
presubmits-verify-lint	`f122a33`	link	true	`/test presubmits-verify-lint`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 31, 2025

k8s-ci-robot requested a review from cheftako October 31, 2025 08:22

k8s-ci-robot requested a review from matthchr October 31, 2025 08:22

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 31, 2025

tjamet reviewed Oct 31, 2025

View reviewed changes

tjamet mentioned this pull request Oct 31, 2025

feat: implement non-blocking CRD reconciliation with requeue support #693

Open

jakobmoellerdev mentioned this pull request Nov 11, 2025

Support GVK reconciliation for a certain time after RGD is deleted #789

Open

ellistarn reviewed Nov 13, 2025

View reviewed changes

tjamet mentioned this pull request Nov 14, 2025

docs: default value of allowCRDDeletion is false, which may cause confusion. #811

Open


		#### 3. Controller-side deletion control (`allowCRDDeletion`)

		Previously, the flag `allowCRDDeletion` determined whether KRO would delete CRDs it had created if the RGD was removed or changed.

KREP-004: OwnerReferences and deletion policy for ResourceGraphDefinitions #763

Are you sure you want to change the base?

KREP-004: OwnerReferences and deletion policy for ResourceGraphDefinitions #763

Uh oh!

Conversation

jakobmoellerdev commented Oct 31, 2025

Uh oh!

k8s-ci-robot commented Oct 31, 2025

Uh oh!

k8s-ci-robot commented Oct 31, 2025

Uh oh!

tjamet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellistarn Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ellistarn Nov 13, 2025 •

edited

Loading