Skip to content

Conversation

@baiyutang
Copy link
Contributor

@baiyutang baiyutang commented Nov 18, 2025

feat: add PodDisruptionBudget support for Karmada control plane components

  • Add PodDisruptionBudgetConfig to CommonSettings API
  • Implement PDB creation/update/deletion logic for all components
  • Add validation for PDB configuration (mutual exclusivity, replicas check)
  • Support both minAvailable and maxUnavailable PDB strategies

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This PR adds PodDisruptionBudget (PDB) support to the Karmada Operator, enabling users to configure high availability guarantees for all Karmada control plane components (karmada-controller-manager, karmada-scheduler, karmada-apiserver, etc.) during planned disruptions.

The feature mirrors the PDB functionality already available in Helm charts, providing consistent high availability guarantees across different installation methods. Users can now declaratively configure PDB strategies through the Karmada CRD, ensuring that control plane components remain available during node maintenance, cluster upgrades, or other planned operations.

Which issue(s) this PR fixes:

Fixes #
Part of #6282

Special notes for your reviewer:

  • All unit tests pass successfully
  • PDB logic has been centralized in a shared package to eliminate code duplication
  • Uses standard Kubernetes labels (app.kubernetes.io/name/instance) for PDB selectors
  • Maintains backward compatibility - PDB fields are optional
  • Follows existing code patterns and uses established constants

**Does this PR introduce a user-fa

Copilot AI review requested due to automatic review settings November 18, 2025 06:40
@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 18, 2025
@karmada-bot karmada-bot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Nov 18, 2025
@karmada-bot karmada-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 18, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @baiyutang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the high availability of Karmada control plane components by integrating PodDisruptionBudget (PDB) functionality. It allows operators to define disruption budgets for critical components, ensuring that a minimum number of replicas remain available during voluntary disruptions like node maintenance or upgrades. This feature aligns with existing Helm chart capabilities, providing a consistent and robust approach to managing component availability.

Highlights

  • PodDisruptionBudget (PDB) Support: Introduced PDB support for all Karmada control plane components, including karmada-controller-manager, karmada-scheduler, karmada-apiserver, and etcd, to enhance high availability during planned disruptions.
  • API Extension: Added PodDisruptionBudgetConfig to the CommonSettings API, allowing declarative configuration of PDB strategies directly through the Karmada CRD.
  • Centralized PDB Logic: Implemented PDB creation, update, and deletion logic in a new, shared pdb package to reduce code duplication and ensure consistency across components.
  • Validation: Added validation for PDB configurations, ensuring mutual exclusivity between minAvailable and maxUnavailable and preventing minAvailable from exceeding the component's replicas count.
  • Backward Compatibility: The new PDB fields are optional, ensuring backward compatibility for existing Karmada installations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copilot finished reviewing on behalf of baiyutang November 18, 2025 06:43
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces PodDisruptionBudget (PDB) support for Karmada control plane components, which is a great feature for improving high availability. The implementation is well-structured, with centralized PDB management logic and validation. The changes also include necessary RBAC updates and extensive test modifications.

I've found a critical issue in the PDB creation/update logic that needs to be fixed. I also have a couple of medium-severity suggestions related to test completeness and code maintainability. Please see my detailed comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive PodDisruptionBudget (PDB) support to the Karmada Operator, enabling high availability guarantees for all control plane components during planned disruptions.

Key Changes:

  • Introduces PodDisruptionBudgetConfig API to CommonSettings with support for minAvailable and maxUnavailable strategies
  • Implements centralized PDB management logic in the operator/pkg/controlplane/pdb package
  • Adds validation for PDB configuration including mutual exclusivity and replica checks
  • Updates RBAC permissions to allow PDB resource management

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
operator/pkg/apis/operator/v1alpha1/type.go Adds PodDisruptionBudgetConfig to CommonSettings API
operator/pkg/controlplane/pdb/pdb.go Implements centralized PDB management logic for all components
operator/pkg/util/apiclient/idempotency.go Adds CreateOrUpdatePodDisruptionBudget helper function
operator/pkg/controller/karmada/validating.go Adds PDB configuration validation including mutual exclusivity checks
operator/pkg/controller/karmada/validating_test.go Adds comprehensive test coverage for PDB validation
operator/pkg/controlplane/*/ Integrates PDB creation for all control plane components
operator/pkg/constants/constants.go Adds component name constants for PDB label mapping
operator/config/deploy/karmada-operator-clusterrole.yaml Adds PDB RBAC permissions
charts/karmada-operator/templates/karmada-operator-clusterrole.yaml Adds PDB RBAC permissions for Helm installation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link

codecov-commenter commented Nov 18, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 21.35922% with 81 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.55%. Comparing base (25cf101) to head (c525644).

Files with missing lines Patch % Lines
operator/pkg/controlplane/pdb/pdb.go 0.00% 52 Missing ⚠️
operator/pkg/util/apiclient/idempotency.go 0.00% 15 Missing ⚠️
operator/pkg/controlplane/apiserver/apiserver.go 0.00% 2 Missing and 2 partials ⚠️
operator/pkg/controlplane/controlplane.go 91.66% 1 Missing and 1 partial ⚠️
operator/pkg/controlplane/etcd/etcd.go 0.00% 1 Missing and 1 partial ⚠️
.../pkg/controlplane/metricsadapter/metricsadapter.go 0.00% 1 Missing and 1 partial ⚠️
operator/pkg/controlplane/search/search.go 0.00% 1 Missing and 1 partial ⚠️
operator/pkg/controlplane/webhook/webhook.go 0.00% 1 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6933      +/-   ##
==========================================
- Coverage   46.61%   46.55%   -0.06%     
==========================================
  Files         698      699       +1     
  Lines       48003    48106     +103     
==========================================
+ Hits        22375    22397      +22     
- Misses      23943    24016      +73     
- Partials     1685     1693       +8     
Flag Coverage Δ
unittests 46.55% <21.35%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@baiyutang
Copy link
Contributor Author

@RainbowMango All workflow and test run successfully.

@RainbowMango RainbowMango removed the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Nov 18, 2025
@RainbowMango
Copy link
Member

cc @jabellard
This is the implementation part.

@jabellard
Copy link
Member

/assign

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RainbowMango
Copy link
Member

@baiyutang Great thanks for your help. It generally looks great to me. I force-pushed to your branch to fix the lint issue and tidy the comments.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
@jabellard Would you like to take a look? or run a test for it, to confirm the functionality as well as the names, labels of the PDB resources?

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 25, 2025
@RainbowMango RainbowMango added this to the v1.16 milestone Nov 25, 2025
@jabellard
Copy link
Member

/lgtm @jabellard Would you like to take a look? or run a test for it, to confirm the functionality as well as the names, labels of the PDB resources?

Looking...

Copy link
Member

@jabellard jabellard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Also, have you tried running an e2e test locally? I'd love to see the results of that.

Copy link
Member

@jabellard jabellard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 27, 2025
@karmada-bot
Copy link
Collaborator

New changes are detected. LGTM label has been removed.

return fmt.Errorf("error when creating deployment for %s, err: %w", apiserverDeployment.Name, err)
}

if err := pdb.EnsurePodDisruptionBudget(client, util.KarmadaAPIServerName(name), namespace, cfg.CommonSettings.PodDisruptionBudgetConfig, apiserverDeployment.Spec.Template.Labels, apiserverDeployment); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the apiserverDeployment holds enough metadata for building the owner reference, like UID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to run a test for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, we can build the owner reference outside of EnsurePodDisruptionBudget, like:

pdbOwner := metav1.NewControllerRef(apiserverDeployment, apiserverDeployment.GroupVersionKind())

Copy link
Member

@RainbowMango RainbowMango Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it on my side, karmada-operator failed to create the PDB resources due to invalid uid:

{"ts":1764217373021.3862,"caller":"controller/controller.go:474","msg":"Reconciler error","controller":"karmada-operator-controller","controllerGroup":"operator.karmada.io","controllerKind":"Karmada","Karmada":{"name":"karmada-demo","namespace":"test"},"namespace":"test","name":"karmada-demo","reconcileID":"10728b88-4d58-4a0c-8de6-48a2aadc0116","err":"failed to install karmada apiserver component, err: failed to install karmada apiserver, err: failed to ensure PDB for apiserver component, err: failed to create PDB resource for karmada-demo-apiserver, err: PodDisruptionBudget.policy "karmada-demo-apiserver" is invalid: metadata.ownerReferences.uid: Invalid value: "": uid must not be empty","errCauses":[{"error":"failed to install karmada apiserver component, err: failed to install karmada apiserver, err: failed to ensure PDB for apiserver component, err: failed to create PDB resource for karmada-demo-apiserver, err: PodDisruptionBudget.policy "karmada-demo-apiserver" is invalid: metadata.ownerReferences.uid: Invalid value: "": uid must not be empty"}]}

}

if err := pdb.EnsurePodDisruptionBudget(client, util.KarmadaAPIServerName(name), namespace, cfg.CommonSettings.PodDisruptionBudgetConfig, apiserverDeployment.Spec.Template.Labels, apiserverDeployment); err != nil {
return fmt.Errorf("failed to ensure PDB for apiserver component, err: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("failed to ensure PDB for apiserver component, err: %w", err)
return fmt.Errorf("failed to ensure PDB for component %s, err: %w", util.KarmadaAPIServerName(name), err)

In case the karmada-operator manages more than one Karmada instance, this log makes it easier to figure out which instance is failing.

Same as below.

Comment on lines +4 to +5
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime/schema"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to remove this after fixing the lint issue.

Comment on lines +97 to +103
// Fetch persisted Deployment to get real UID
persisted, err := client.AppsV1().Deployments(namespace).Get(context.TODO(), apiserverDeployment.GetName(), metav1.GetOptions{})
if err != nil {
return fmt.Errorf("failed to fetch Deployment %s/%s for PDB owner, err: %w", namespace, apiserverDeployment.GetName(), err)
}
gvk := schema.GroupVersionKind{Group: "apps", Version: "v1", Kind: "Deployment"}
ownerRef := *metav1.NewControllerRef(persisted, gvk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to fetch the deployment one more time, just:

diff --git a/operator/pkg/controlplane/apiserver/apiserver.go b/operator/pkg/controlplane/apiserver/apiserver.go 
index 1087dd905..208189397 100644
--- a/operator/pkg/controlplane/apiserver/apiserver.go
+++ b/operator/pkg/controlplane/apiserver/apiserver.go
@@ -94,13 +94,7 @@ func installKarmadaAPIServer(client clientset.Interface, cfg *operatorv1alpha1.K
                return fmt.Errorf("error when creating deployment for %s, err: %w", apiserverDeployment.Name, err)
        }
 
-       // Fetch persisted Deployment to get real UID
-       persisted, err := client.AppsV1().Deployments(namespace).Get(context.TODO(), apiserverDeployment.GetName(), metav1.GetOptions{})
-       if err != nil {
-               return fmt.Errorf("failed to fetch Deployment %s/%s for PDB owner, err: %w", namespace, apiserverDeployment.GetName(), err)
-       }
-       gvk := schema.GroupVersionKind{Group: "apps", Version: "v1", Kind: "Deployment"}
-       ownerRef := *metav1.NewControllerRef(persisted, gvk)
+       ownerRef := *metav1.NewControllerRef(apiserverDeployment, apiserverDeployment.GroupVersionKind())
        if err := pdb.EnsurePodDisruptionBudget(client, util.KarmadaAPIServerName(name), namespace, cfg.CommonSettings.PodDisruptionBudgetConfig, apiserverDeployment.Spec.Template.Labels, []metav1.OwnerReference{ownerRef}); err !=
 nil {
                return fmt.Errorf("failed to ensure PDB for apiserver component %s, err: %w", util.KarmadaAPIServerName(name), err)
        }
diff --git a/operator/pkg/util/apiclient/idempotency.go b/operator/pkg/util/apiclient/idempotency.go
index b20150f96..fc1faeec7 100644
--- a/operator/pkg/util/apiclient/idempotency.go
+++ b/operator/pkg/util/apiclient/idempotency.go
@@ -123,13 +123,13 @@ func CreateOrUpdateService(client clientset.Interface, service *corev1.Service)

 // CreateOrUpdateDeployment creates a Deployment if the target resource doesn't exist. If the resource exists already, this function will update the resource instead.
 func CreateOrUpdateDeployment(client clientset.Interface, deployment *appsv1.Deployment) error {
-       _, err := client.AppsV1().Deployments(deployment.GetNamespace()).Create(context.TODO(), deployment, metav1.CreateOptions{})
+       deployment, err := client.AppsV1().Deployments(deployment.GetNamespace()).Create(context.TODO(), deployment, metav1.CreateOptions{})
        if err != nil {
                if !apierrors.IsAlreadyExists(err) {
                        return err
                }

-               _, err := client.AppsV1().Deployments(deployment.GetNamespace()).Update(context.TODO(), deployment, metav1.UpdateOptions{})
+               deployment, err = client.AppsV1().Deployments(deployment.GetNamespace()).Update(context.TODO(), deployment, metav1.UpdateOptions{})
                if err != nil {
                        return err
                }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have all forced pushes. So there is a conflict. Maybe you can delete it locally and then checkout this branch, or accept incoming altogether

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let me handle it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants