-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: add PodDisruptionBudget support for Karmada control plane compo… #6933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @baiyutang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the high availability of Karmada control plane components by integrating PodDisruptionBudget (PDB) functionality. It allows operators to define disruption budgets for critical components, ensuring that a minimum number of replicas remain available during voluntary disruptions like node maintenance or upgrades. This feature aligns with existing Helm chart capabilities, providing a consistent and robust approach to managing component availability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces PodDisruptionBudget (PDB) support for Karmada control plane components, which is a great feature for improving high availability. The implementation is well-structured, with centralized PDB management logic and validation. The changes also include necessary RBAC updates and extensive test modifications.
I've found a critical issue in the PDB creation/update logic that needs to be fixed. I also have a couple of medium-severity suggestions related to test completeness and code maintainability. Please see my detailed comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive PodDisruptionBudget (PDB) support to the Karmada Operator, enabling high availability guarantees for all control plane components during planned disruptions.
Key Changes:
- Introduces
PodDisruptionBudgetConfigAPI toCommonSettingswith support forminAvailableandmaxUnavailablestrategies - Implements centralized PDB management logic in the
operator/pkg/controlplane/pdbpackage - Adds validation for PDB configuration including mutual exclusivity and replica checks
- Updates RBAC permissions to allow PDB resource management
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
operator/pkg/apis/operator/v1alpha1/type.go |
Adds PodDisruptionBudgetConfig to CommonSettings API |
operator/pkg/controlplane/pdb/pdb.go |
Implements centralized PDB management logic for all components |
operator/pkg/util/apiclient/idempotency.go |
Adds CreateOrUpdatePodDisruptionBudget helper function |
operator/pkg/controller/karmada/validating.go |
Adds PDB configuration validation including mutual exclusivity checks |
operator/pkg/controller/karmada/validating_test.go |
Adds comprehensive test coverage for PDB validation |
operator/pkg/controlplane/*/ |
Integrates PDB creation for all control plane components |
operator/pkg/constants/constants.go |
Adds component name constants for PDB label mapping |
operator/config/deploy/karmada-operator-clusterrole.yaml |
Adds PDB RBAC permissions |
charts/karmada-operator/templates/karmada-operator-clusterrole.yaml |
Adds PDB RBAC permissions for Helm installation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6933 +/- ##
==========================================
- Coverage 46.61% 46.55% -0.06%
==========================================
Files 698 699 +1
Lines 48003 48106 +103
==========================================
+ Hits 22375 22397 +22
- Misses 23943 24016 +73
- Partials 1685 1693 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@RainbowMango All workflow and test run successfully. |
|
cc @jabellard |
|
/assign |
RainbowMango
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2b52180 to
935f72b
Compare
e48df2b to
ee3079e
Compare
|
@baiyutang Great thanks for your help. It generally looks great to me. I force-pushed to your branch to fix the lint issue and tidy the comments. |
RainbowMango
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@jabellard Would you like to take a look? or run a test for it, to confirm the functionality as well as the names, labels of the PDB resources?
Looking... |
jabellard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. Also, have you tried running an e2e test locally? I'd love to see the results of that.
jabellard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
|
New changes are detected. LGTM label has been removed. |
e342af3 to
c525644
Compare
| return fmt.Errorf("error when creating deployment for %s, err: %w", apiserverDeployment.Name, err) | ||
| } | ||
|
|
||
| if err := pdb.EnsurePodDisruptionBudget(client, util.KarmadaAPIServerName(name), namespace, cfg.CommonSettings.PodDisruptionBudgetConfig, apiserverDeployment.Spec.Template.Labels, apiserverDeployment); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the apiserverDeployment holds enough metadata for building the owner reference, like UID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to run a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, we can build the owner reference outside of EnsurePodDisruptionBudget, like:
pdbOwner := metav1.NewControllerRef(apiserverDeployment, apiserverDeployment.GroupVersionKind())There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it on my side, karmada-operator failed to create the PDB resources due to invalid uid:
{"ts":1764217373021.3862,"caller":"controller/controller.go:474","msg":"Reconciler error","controller":"karmada-operator-controller","controllerGroup":"operator.karmada.io","controllerKind":"Karmada","Karmada":{"name":"karmada-demo","namespace":"test"},"namespace":"test","name":"karmada-demo","reconcileID":"10728b88-4d58-4a0c-8de6-48a2aadc0116","err":"failed to install karmada apiserver component, err: failed to install karmada apiserver, err: failed to ensure PDB for apiserver component, err: failed to create PDB resource for karmada-demo-apiserver, err: PodDisruptionBudget.policy "karmada-demo-apiserver" is invalid: metadata.ownerReferences.uid: Invalid value: "": uid must not be empty","errCauses":[{"error":"failed to install karmada apiserver component, err: failed to install karmada apiserver, err: failed to ensure PDB for apiserver component, err: failed to create PDB resource for karmada-demo-apiserver, err: PodDisruptionBudget.policy "karmada-demo-apiserver" is invalid: metadata.ownerReferences.uid: Invalid value: "": uid must not be empty"}]}
| } | ||
|
|
||
| if err := pdb.EnsurePodDisruptionBudget(client, util.KarmadaAPIServerName(name), namespace, cfg.CommonSettings.PodDisruptionBudgetConfig, apiserverDeployment.Spec.Template.Labels, apiserverDeployment); err != nil { | ||
| return fmt.Errorf("failed to ensure PDB for apiserver component, err: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return fmt.Errorf("failed to ensure PDB for apiserver component, err: %w", err) | |
| return fmt.Errorf("failed to ensure PDB for component %s, err: %w", util.KarmadaAPIServerName(name), err) |
In case the karmada-operator manages more than one Karmada instance, this log makes it easier to figure out which instance is failing.
Same as below.
c525644 to
8a5355b
Compare
…nents Signed-off-by: baiyutang <[email protected]>
8a5355b to
5b4b485
Compare
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/apimachinery/pkg/runtime/schema" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot to remove this after fixing the lint issue.
| // Fetch persisted Deployment to get real UID | ||
| persisted, err := client.AppsV1().Deployments(namespace).Get(context.TODO(), apiserverDeployment.GetName(), metav1.GetOptions{}) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to fetch Deployment %s/%s for PDB owner, err: %w", namespace, apiserverDeployment.GetName(), err) | ||
| } | ||
| gvk := schema.GroupVersionKind{Group: "apps", Version: "v1", Kind: "Deployment"} | ||
| ownerRef := *metav1.NewControllerRef(persisted, gvk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to fetch the deployment one more time, just:
diff --git a/operator/pkg/controlplane/apiserver/apiserver.go b/operator/pkg/controlplane/apiserver/apiserver.go
index 1087dd905..208189397 100644
--- a/operator/pkg/controlplane/apiserver/apiserver.go
+++ b/operator/pkg/controlplane/apiserver/apiserver.go
@@ -94,13 +94,7 @@ func installKarmadaAPIServer(client clientset.Interface, cfg *operatorv1alpha1.K
return fmt.Errorf("error when creating deployment for %s, err: %w", apiserverDeployment.Name, err)
}
- // Fetch persisted Deployment to get real UID
- persisted, err := client.AppsV1().Deployments(namespace).Get(context.TODO(), apiserverDeployment.GetName(), metav1.GetOptions{})
- if err != nil {
- return fmt.Errorf("failed to fetch Deployment %s/%s for PDB owner, err: %w", namespace, apiserverDeployment.GetName(), err)
- }
- gvk := schema.GroupVersionKind{Group: "apps", Version: "v1", Kind: "Deployment"}
- ownerRef := *metav1.NewControllerRef(persisted, gvk)
+ ownerRef := *metav1.NewControllerRef(apiserverDeployment, apiserverDeployment.GroupVersionKind())
if err := pdb.EnsurePodDisruptionBudget(client, util.KarmadaAPIServerName(name), namespace, cfg.CommonSettings.PodDisruptionBudgetConfig, apiserverDeployment.Spec.Template.Labels, []metav1.OwnerReference{ownerRef}); err !=
nil {
return fmt.Errorf("failed to ensure PDB for apiserver component %s, err: %w", util.KarmadaAPIServerName(name), err)
}
diff --git a/operator/pkg/util/apiclient/idempotency.go b/operator/pkg/util/apiclient/idempotency.go
index b20150f96..fc1faeec7 100644
--- a/operator/pkg/util/apiclient/idempotency.go
+++ b/operator/pkg/util/apiclient/idempotency.go
@@ -123,13 +123,13 @@ func CreateOrUpdateService(client clientset.Interface, service *corev1.Service)
// CreateOrUpdateDeployment creates a Deployment if the target resource doesn't exist. If the resource exists already, this function will update the resource instead.
func CreateOrUpdateDeployment(client clientset.Interface, deployment *appsv1.Deployment) error {
- _, err := client.AppsV1().Deployments(deployment.GetNamespace()).Create(context.TODO(), deployment, metav1.CreateOptions{})
+ deployment, err := client.AppsV1().Deployments(deployment.GetNamespace()).Create(context.TODO(), deployment, metav1.CreateOptions{})
if err != nil {
if !apierrors.IsAlreadyExists(err) {
return err
}
- _, err := client.AppsV1().Deployments(deployment.GetNamespace()).Update(context.TODO(), deployment, metav1.UpdateOptions{})
+ deployment, err = client.AppsV1().Deployments(deployment.GetNamespace()).Update(context.TODO(), deployment, metav1.UpdateOptions{})
if err != nil {
return err
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we have all forced pushes. So there is a conflict. Maybe you can delete it locally and then checkout this branch, or accept incoming altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let me handle it.
feat: add PodDisruptionBudget support for Karmada control plane components
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
This PR adds PodDisruptionBudget (PDB) support to the Karmada Operator, enabling users to configure high availability guarantees for all Karmada control plane components (karmada-controller-manager, karmada-scheduler, karmada-apiserver, etc.) during planned disruptions.
The feature mirrors the PDB functionality already available in Helm charts, providing consistent high availability guarantees across different installation methods. Users can now declaratively configure PDB strategies through the Karmada CRD, ensuring that control plane components remain available during node maintenance, cluster upgrades, or other planned operations.
Which issue(s) this PR fixes:
Fixes #
Part of #6282
Special notes for your reviewer:
**Does this PR introduce a user-fa