-
Notifications
You must be signed in to change notification settings - Fork 266
KREP-004: OwnerReferences and deletion policy for ResourceGraphDefinitions #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ourceGraphDefinitions Add `OwnerReferences` between ResourceGraphDefinitions (RGDs) and generated CRDs to align with Kubernetes-native lifecycle management. Introduce a `ValidatingAdmissionPolicy` to prevent accidental RGD deletion and refine the `allowCRDDeletion` flag for clear controller-side cleanup behavior. Update documentation and tests accordingly. Signed-off-by: Jakob Möller <[email protected]>
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jakobmoellerdev The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
tjamet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 on aligning with standard k8s behaviours.
I would be in favour of moving forward with this proposal with 2 changes:
- change the parameter from
allowCRDDeletiontoprotectCRDDeletionor similar - [not mandatory] default behaviour is to delete "dangling" CRDs
|
|
||
| 1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD. | ||
| 2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs. | ||
| 3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the current implementation does not fully honor the flag?
What about making the default to true?
There's also a discussion to have about finalizers that plays a different role IMO (the best place is probably below in the discarded section)
| 3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation. | |
| 1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD. | |
| 2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs (blocks deletion of RGDs with instances) | |
| 3. **Controller deletion control** — The Helm flag `allowCRDDeletion` now cleanly governs whether KRO is permitted to delete CRDs as part of reconciliation. |
|
|
||
| * When an RGD is deleted, its owned CRD is automatically garbage-collected. This is proper behavior | ||
| as we can still delete RGDs with policy "orphan" to preserve CRDs. | ||
| * Manual CRD deletion (while RGD still exists) will trigger re-reconciliation, as ownership implies controller responsibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
| * When `allowCRDDeletion: false`: | ||
| KRO still sets owner references, but **does not itself delete CRDs**. | ||
| Deletion is handled by Kubernetes GC only when the RGD is removed. | ||
| The ValidatingAdmissionPolicy remains active to prevent accidental RGD deletion. | ||
|
|
||
| * When `allowCRDDeletion: true`: | ||
| KRO may directly delete CRDs it manages (e.g., during reconciliation or cleanup). | ||
| The Helm chart skips installation of the ValidatingAdmissionPolicy, allowing full manual control. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure I make sense of the semantics.
It feels weird to me to have allowCRDDeletion: false and that CRD would be deleted anyhow (by k8s).
There's also a case that could lead to inconsistent behaviour, when the RGD Kind, Group or APIVersion changes. In this case, when allowCRDDeletion: false we wouldn't delete the dangling CRD but the CRD would be deleted in case of RGD deletion.
The proposed behaviour and lifecycle sounds more like confirmCRDDeletion or requireManualJudgementForCRDDeletion
This semantics would also bring consistency when Kind, Group or APIVersion changes.
Maybe the validation webhook needs to be on the CRD itself then?
| | Option | Reason Rejected | | ||
| | ------------------------------------ |-----------------------------------------------------------------------------------------| | ||
| | Continue using label linkage only | No lifecycle tracking or automatic cleanup, against k8s API semantics | | ||
| | Finalizer-based blocking | Overcomplicates reconciliation; redundant with OwnerReferences and foreground finalizer | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you develop on what makes reconciliation more complex and redundant with ownerreferences?
From kubernetes' documentation, finalizer seem to pursue this objective: hold a resource until all clean-up is done.
The target object remains in a terminating state while the control plane, or other components, take the actions defined by the finalizers
The only difference with the current approach is that the RGD would be marked for deletion and have to be entirely deleted before being re-applied, while the proposed approach would block deletion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. However, kubernetes sets its own foregroundDeletion as part of client based lifecycle handling: https://kubernetes.io/docs/concepts/architecture/garbage-collection/#foreground-deletion . The proposed approach effectively makes client deletion protected by a label and then allows users to control deletion behavior based on propagation policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure how a label protects deletion.
It only applies to the KRO controller's own reconciliations, right?
I mean, it does not prevent anybody else from doing kubectl delete RGD or kubectl delete RGDInstance correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No thats not correct. If we use a policy this is on API server level so the deletion protection is enforced globally on the API, not just for the controller
| This proposal introduces three coordinated changes: | ||
|
|
||
| 1. **Ownership introduction** — CRDs now include an `OwnerReference` pointing to the creating RGD. | ||
| 2. **Deletion protection** — A `ValidatingAdmissionPolicy` prevents accidental user deletion of RGDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand protecting against the deletion of the Kinds managed by RGDs, but what's the thinking behind stopping a user that says k delete rgd foo from deleting the rgd foo. Delete means delete, right?
It strikes me that this could be something that organizations choose to opt in more broadly (deletion protection for any k8s resource), but odd that KRO would take the opinion for users when other projects don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main reason is that by default kubernetes deletes with background propagation. if we delete a RGD, that would automatically start deletion of the CRD as well during the delete. I only added this after there was an opinion against simply allowing a delete. I personally favor this too but adjusted after community call feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, k8s standard leans towards no protection and delete everything, cascade.
Moreover, once marked for deletion, there is no way back, you must complete deletion before being able to re-use it again (kubectl delete starts by setting the deletionTimestamp that can't be undone)
As there is a clear 1-1 ownership link RGD <-> CRD I would indeed favour the standard behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally favor this too but adjusted after community call feedback
I think this is the kind of thing where you can easily find voices in both directions. You'll see this debate surface through many AWS APIs, too. Just try deleting an S3 bucket with objects in it.
Given the controversy and the fact that it's broader than just our API, I'd just stick with the existing standard and let a separate solution handle deletion protection holistically.
|
|
||
| #### 3. Controller-side deletion control (`allowCRDDeletion`) | ||
|
|
||
| Previously, the flag `allowCRDDeletion` determined whether KRO would delete CRDs it had created if the RGD was removed or changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious to explore the use cases for setting this flag one way or another. If we're certain that all of the instances managed by the RGD are gone, is it really dangerous to remove the Kind? I'm not seeing the upsides of leaking it if we're confident we're not losing data. I can see downsides of not removing the kind, such as attempting to do a fresh installation of a new version where the Kind has changed substantially.
|
@jakobmoellerdev: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add
OwnerReferencesbetween ResourceGraphDefinitions (RGDs) and generated CRDs to align with Kubernetes-native lifecycle management. Introduce aValidatingAdmissionPolicyto prevent accidental RGD deletion and refine theallowCRDDeletionflag for clear controller-side cleanup behavior. Update documentation and tests accordingly.Backing KREP for #749 and #745