Skip to content

fix(sandbox-manager): resolve post-claim 502s and add peer reconciliation for split-brain recovery#313

Open
AiRanthem wants to merge 2 commits into
openkruise:masterfrom
AiRanthem:feature/better-peer-260428
Open

fix(sandbox-manager): resolve post-claim 502s and add peer reconciliation for split-brain recovery#313
AiRanthem wants to merge 2 commits into
openkruise:masterfrom
AiRanthem:feature/better-peer-260428

Conversation

@AiRanthem
Copy link
Copy Markdown
Member

Summary

  • Fix a regression that could return 502 after sandbox claiming by correcting proxy routing/ext-proc behavior and sandbox manager infra interaction.
  • Add peer reconciliation capability to recover from split-brain scenarios and improve cluster peer consistency.

Changes in This PR

1) Fix 502 errors after sandbox claiming

  • Update proxy request handling and route resolution logic:
    • pkg/proxy/ext_proc.go
    • pkg/proxy/routes.go
  • Add/adjust regression tests for proxy behavior:
    • pkg/proxy/ext_proc_test.go
  • Align sandbox manager infra behavior with updated routing expectations:
    • pkg/sandbox-manager/infra/sandboxcr/infra.go

2) Peer reconciliation for split-brain healing

  • Introduce peer reconciliation implementation:
    • pkg/peers/reconcile.go
  • Add dedicated tests for reconciliation behavior:
    • pkg/peers/reconcile_test.go
  • Integrate reconciliation with existing peer/memberlist flow:
    • pkg/peers/memberlist.go
    • pkg/peers/peers.go
  • Update related tests and wire-up points:
    • pkg/peers/memberlist_test.go
    • pkg/sandbox-gateway/server/server.go
    • pkg/sandbox-manager/core.go

Test Plan

  • Updated proxy unit tests for post-claim routing/ext-proc behavior.
  • Added peer reconciliation unit tests, including split-brain recovery paths.
  • Run full upstream CI for integration-level verification.

@kruise-bot kruise-bot requested review from furykerry and zmberg April 28, 2026 09:15
@kruise-bot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zmberg for approval by writing /assign @zmberg in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kruise-bot
Copy link
Copy Markdown

@AiRanthem: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants