Skip to content

checker, scatter, server: add load-based split-scatter with range-aware group baseline#10621

Open
lhy1024 wants to merge 17 commits intotikv:masterfrom
lhy1024:split-scatter-2
Open

checker, scatter, server: add load-based split-scatter with range-aware group baseline#10621
lhy1024 wants to merge 17 commits intotikv:masterfrom
lhy1024:split-scatter-2

Conversation

@lhy1024
Copy link
Copy Markdown
Contributor

@lhy1024 lhy1024 commented Apr 24, 2026

What problem does this PR solve?

Issue Number: Ref #10592

What is changed and how does it work?

This PR adds an internal split-scatter flow for newly split regions on the monolithic PD path.

Scope in this PR:

  • Only splits with SplitReason_LOAD enter the new split-scatter flow.
  • HandleAskBatchSplit records the source region and newly allocated region IDs into a short-lived pending state.
  • The checker waits until the split result is observable in PD (heartbeat / source version advance), then dispatches internal scatter for the pending regions.
  • Internal scatter uses a range-aware live baseline plus running-operator delta, so regions from the same split batch can be scattered with the same group context.
  • The existing admin scatter model remains aligned with pingcap/master.

Out of scope in this PR:

The scheduling-service / NEXT_GEN gap is left as explicit TODOs in the forwarding code.

checker, scatter, server: add load-based split-scatter on monolithic PD path

Record load-based split batches in PD, wait until the split result is visible,
and dispatch internal scatter with range-aware grouping.

Keep admin scatter behavior aligned with pingcap/master. Leave
scheduling-service / NEXT_GEN support and the fuller priority-queue proposal
out of scope for follow-up work.

Check List

Tests

  • Unit test
  • Manual validation on testbed for load-based split -> internal split-scatter flow
    • TiKV 9.0.0-beta.2 (28890b9736b1553a162c63373fdc052bb60c9172) on testbed 8069489
    • PD image hash 523a77e2a0f5656c9d6cc78b1dc6b45649147dad
    • TiDB copr cache disabled (tikv-client.copr-cache.capacity-mb = 0)
    • Background schedulers removed during the run (balance-leader, balance-region, balance-hot-region, evict-*) and merge-schedule-limit set to 0
    • UNIFIED_THRESHOLD=0.3, single-region baseline: 1 -> 5 regions, PD created and finished internal-scatter-region operators
    • UNIFIED_THRESHOLD=0.2, single-region baseline: 1 -> 33 regions, split-scatter activity concentrated in the first ~3 minutes and then converged
image image image

Release note

Load-based split-scatter currently applies only to monolithic PD. scheduling-service/NEXT_GEN forwarding does not propagate SplitReason for this flow yet.

Summary by CodeRabbit

  • New Features

    • Automatic internal scatter dispatch for newly split regions triggered by load, with grouping and gated dispatch.
  • Improvements

    • More accurate internal peer/leader selection to reduce placement gaps and better balance after splits.
    • Dispatch respects version gating, retry/backoff, and expiration to avoid premature or duplicate actions.
  • Tests

    • Extensive tests validating split-scatter recording, eligibility, dispatch lifecycle, backoff, and range-hint resolution.
  • Chores

    • Added a patrol-phase metric to track split-scatter dispatch duration.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 24, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 24, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign siddontang for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Records load-triggered batch splits, tracks pending split-scatter tasks with TTL/retry/waitVersion, dispatches internal scatter operators with shared group IDs using range hints, adds a Checker patrol phase metric, and includes unit and integration tests plus an internal ScatterInternal API.

Changes

Cohort / File(s) Summary
TODO Annotations
pkg/mcs/scheduling/server/grpc_service.go, server/grpc_service.go
Add TODOs in AskBatchSplit forwarding path noting missing propagation of split reason and future NEXT_GEN load-based split-scatter; no behavior change.
Checker Controller
pkg/schedule/checker/checker_controller.go
Register and invoke split-scatter controller during patrol; record patrol phase duration metric.
Split-Scatter Controller & API
pkg/schedule/checker/split_scatter.go, pkg/schedule/checker/split_scatter_group.go
New split-scatter controller: pending map with group/waitVersion/retryAt/expireAt, top-N collection, dispatch logic with backoff and eligibility checks; public Controller.RecordSplitScatterBatch; range-hint helpers and prefix-boundary utilities.
Checker Metrics
pkg/schedule/checker/metrics.go
Add dispatch_split_scatter patrol phase label and initialize corresponding histogram entry.
Scatter Core & API
pkg/schedule/scatter/region_scatterer.go
Introduce ScatterInternal API, new operator descriptors (AdminScatterOperatorDesc, InternalScatterOperatorDesc), local per-request scatter state, updated peer/leader selection and recording logic for internal scatter.
Scatter Tests
pkg/schedule/scatter/region_scatterer_test.go
Refactor/add tests for internal-scatter selection, seeded local state, hot-region gating, operator-failure/accounting, and distribution delta assertions.
Checker Tests
pkg/schedule/checker/split_scatter_test.go
Add tests for RecordSplitScatterBatch, pending collection, dispatch/backoff, range-hint resolution, waitVersion gating, and stale-snapshot handling.
Cluster Integration
server/cluster/cluster_worker.go, server/cluster/split_scatter_test.go
Collect newRegionIDs in HandleAskBatchSplit; report SplitReason_LOAD batches to checker; add integration tests asserting InternalScatter operator creation and group naming.

Sequence Diagram(s)

sequenceDiagram
    participant Store as TiKV Store
    participant ClusterWorker as Cluster Worker
    participant CheckerController as Checker Controller
    participant SplitScatter as Split-Scatter Controller
    participant Scatterer as Region Scatterer

    Store->>ClusterWorker: AskBatchSplit (SplitReason_LOAD)
    ClusterWorker->>ClusterWorker: collect newRegionIDs
    ClusterWorker->>CheckerController: RecordSplitScatterBatch(sourceRegionID, newRegionIDs)
    CheckerController->>SplitScatter: create pending entries (group, TTL, waitVersion)

    Store->>ClusterWorker: Region heartbeats (new regions)
    ClusterWorker->>CheckerController: patrol tick (dispatch phase)
    CheckerController->>SplitScatter: dispatchSplitScatterRegions()

    SplitScatter->>SplitScatter: filter non-expired & eligible, check waitVersion
    SplitScatter->>SplitScatter: verify replica counts, resolve range hint
    SplitScatter->>Scatterer: ScatterInternal(region, group, startKey, endKey)
    Scatterer->>Scatterer: select peers/leaders (local state), create operator
    Scatterer-->>SplitScatter: operator or error

    alt success
        SplitScatter->>SplitScatter: remove pending entry
    else insufficient replicas or throttled or failure
        SplitScatter->>SplitScatter: set retryAt/backoff
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

size/L, lgtm, approved

Suggested reviewers

  • okJiang
  • rleungx

Poem

🐰 I nibbled keys where regions split and grew,

I named each brood with a hopping hue.
TTL and wait, I pause then dart,
I scatter peers to balance the chart. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: introducing load-based split-scatter with range-aware group baseline across checker, scatter, and server packages.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description follows the repository template structure, includes issue reference, clear explanation of changes, commit message, test validation with images, and appropriate release note.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 24, 2026
…re group baseline

Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Copy Markdown
Contributor Author

lhy1024 commented Apr 27, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (6)
pkg/schedule/checker/checker_controller.go (1)

171-174: measure(... phaseCheckPending ...) now covers two distinct phases.

The histogram bucket phaseCheckPending previously timed only checkPendingProcessedRegions. Including dispatchSplitScatterRegions in the same measure block silently changes the metric's semantics and could mask regressions in either phase. Consider a dedicated phase for split-scatter dispatch (or at least a separate measure call inside this lambda).

♻️ Proposed split
-			measure(c.metrics.patrolPhaseHistograms[phaseCheckPending], func() {
-				c.splitScatter.dispatchSplitScatterRegions()
-				c.checkPendingProcessedRegions()
-			})
+			measure(c.metrics.patrolPhaseHistograms[phaseDispatchSplitScatter], func() {
+				c.splitScatter.dispatchSplitScatterRegions()
+			})
+			measure(c.metrics.patrolPhaseHistograms[phaseCheckPending], func() {
+				c.checkPendingProcessedRegions()
+			})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/checker/checker_controller.go` around lines 171 - 174, The
current measure call around c.splitScatter.dispatchSplitScatterRegions() and
c.checkPendingProcessedRegions() mixes two phases into
c.metrics.patrolPhaseHistograms[phaseCheckPending]; separate them so each phase
is timed independently: call measure(...) once around
c.splitScatter.dispatchSplitScatterRegions() (using a new histogram key like
phaseDispatchSplitScatter or another appropriate histogram in
c.metrics.patrolPhaseHistograms) and keep a separate measure(...) around
c.checkPendingProcessedRegions() that continues to use phaseCheckPending; update
any histogram key enums/constants to include the new phase name and ensure both
measure invocations reference the correct histogram entries.
pkg/schedule/checker/split_scatter.go (2)

149-165: recordSplitScatterBatch calls cluster.GetRegion while holding pendingMu — minor risk.

Line 161 reads from the cluster while the split-scatter write lock is held. GetRegion typically takes a separate basic-cluster RW lock, so today this is safe, but it's a non-trivial cross-lock acquisition pattern. Consider reading sourceRegion's version before taking pendingMu to keep the critical section purely local to the pending map and lock-ordering trivially correct.

♻️ Suggested ordering
 func (c *splitScatterController) recordSplitScatterBatch(sourceRegionID uint64, newRegionIDs []uint64) {
 	if len(newRegionIDs) == 0 {
 		return
 	}
 	group := makeSplitScatterGroup(sourceRegionID, newRegionIDs[0])
 	expireAt := time.Now().Add(splitScatterPendingTTL)
+	sourcePending := splitScatterPendingItem{regionID: sourceRegionID, group: group, waitVersion: 1, expireAt: expireAt}
+	if sourceRegion := c.cluster.GetRegion(sourceRegionID); sourceRegion != nil && sourceRegion.GetRegionEpoch() != nil {
+		sourcePending.waitVersion = sourceRegion.GetRegionEpoch().GetVersion() + 1
+	}
 	c.pendingMu.Lock()
 	defer c.pendingMu.Unlock()
 	for _, regionID := range newRegionIDs {
 		c.pending[regionID] = splitScatterPendingItem{regionID: regionID, group: group, expireAt: expireAt}
 	}
-	sourcePending := splitScatterPendingItem{regionID: sourceRegionID, group: group, waitVersion: 1, expireAt: expireAt}
-	if sourceRegion := c.cluster.GetRegion(sourceRegionID); sourceRegion != nil && sourceRegion.GetRegionEpoch() != nil {
-		sourcePending.waitVersion = sourceRegion.GetRegionEpoch().GetVersion() + 1
-	}
 	c.pending[sourceRegionID] = sourcePending
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/checker/split_scatter.go` around lines 149 - 165, The function
recordSplitScatterBatch currently calls c.cluster.GetRegion while holding
c.pendingMu which creates a cross-lock acquisition; to fix, call
c.cluster.GetRegion(sourceRegionID) and compute sourcePending.waitVersion (using
sourceRegion.GetRegionEpoch().GetVersion()+1 if non-nil) before acquiring
c.pendingMu, then lock c.pendingMu and populate c.pending with
splitScatterPendingItem entries (including the precomputed sourcePending) and
set expireAt; keep all other logic (makeSplitScatterGroup,
splitScatterPendingTTL) the same and only move the cluster read out of the
critical section.

33-37: Hard-coded tuning constants — PR objective calls for them to be configurable.

splitScatterDispatchLimit, splitScatterRetryBackoff, and splitScatterPendingTTL are package-level constants. Issue #10592 mentions "configurable limit" for the per-tick dispatch. For a feature that may need ops tuning under load, please thread these through config.CheckerConfigProvider (or a new sub-config) so they're adjustable without recompile.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/checker/split_scatter.go` around lines 33 - 37, Replace the
package-level constants splitScatterDispatchLimit, splitScatterRetryBackoff, and
splitScatterPendingTTL with configurable fields supplied by the existing
config.CheckerConfigProvider (or add a small SplitScatter sub-config on that
provider); update split_scatter.go to read defaults from the provider (using
existing sensible defaults equal to the current constants) and reference the
provider values wherever splitScatterDispatchLimit, splitScatterRetryBackoff,
and splitScatterPendingTTL were used (e.g., in the dispatch loop, retry logic,
and TTL checks). Add accessor methods or struct fields on CheckerConfigProvider
(or a new SplitScatterConfig struct) so callers can obtain int dispatchLimit,
time.Duration retryBackoff, and time.Duration pendingTTL; ensure unit tests or
call sites are updated to use the provider-based values and preserve current
behavior when the provider returns zero/nil by falling back to the previous
constants as defaults.
pkg/mcs/scheduling/server/grpc_service.go (1)

443-445: TODO is fine, but please consider opening a tracking issue.

The TODO references the NEXT_GEN/scheduling-service path — acceptable for this PR since the load-based split-scatter only runs in monolithic PD mode today. To avoid drift, consider linking a tracking issue ID in the comment so the gap doesn't get lost once this lands.

Want me to draft a follow-up issue referencing this TODO and the matching one in server/grpc_service.go?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/mcs/scheduling/server/grpc_service.go` around lines 443 - 445, The TODO
about load-based split-scatter in grpc_service.go should be linked to a tracking
issue: create a short follow-up issue describing the gap (recording
split-scatter batches and plumbing SplitReason for scheduling-service/NEXT_GEN)
and then update the TODO comment near the existing "TODO(split-scatter)" (and
the matching TODO in server/grpc_service.go) to include the issue number/URL
(e.g. "TODO(split-scatter): tracking: ISSUE-1234") so future readers can find
the follow-up; ensure the comment mentions the feature ("load-based
split-scatter"), the affected path ("scheduling-service/NEXT_GEN"), and the
intended work ("record split-scatter batches and plumb SplitReason").
pkg/schedule/scatter/region_scatterer.go (2)

374-400: newInternalScatterState cost scales with total running ops, not just this group.

Every call to ScatterInternal rebuilds the scatter state from scratch:

  1. seedScatterStateByRange walks all stores and calls GetStorePeerCountByRange / GetStoreLeaderCountByRange per store.
  2. applyRunningScatterOpsDelta iterates the entire opController.GetOperators() set, even though only ops with Desc() == InternalScatterOperatorDesc and matching group contribute. For each match, it then calls r.cluster.GetRegion(op.RegionID()).

For a split-scatter batch dispatching N regions sequentially, this is O(N * total_running_ops). In a busy cluster (10k+ pending operators), this can become a noticeable per-region tax on a hot dispatch path.

Consider one of:

  • caching state per group for the duration of a batch and only re-applying the delta of newly added ops since the last call;
  • maintaining an index of internal-scatter ops keyed by group in the controller (or a lightweight registry inside RegionScatterer) to avoid the linear scan.

Optional — not a correctness issue, but worth a perf check before this lands behind a feature flag.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/scatter/region_scatterer.go` around lines 374 - 400, The hot
path in newInternalScatterState currently scans all ops via
opController.GetOperators() in applyRunningScatterOpsDelta (filtering by
InternalScatterOperatorDesc and group) and calls r.cluster.GetRegion for each
match, causing O(N * total_running_ops) cost; fix by avoiding a full scan —
either cache the built scatterState per group for the lifetime of a scatter
batch and only apply deltas for newly added ops, or maintain an index of
internal-scatter operators keyed by group (in the controller or a lightweight
map inside RegionScatterer) so applyRunningScatterOpsDelta can iterate only
relevant ops; update newInternalScatterState to use the group-cache or indexed
lookup and change applyRunningScatterOpsDelta to iterate the filtered set (still
using finalPlacementAfterOperator and r.cluster.GetRegion for matched ops) and
ensure cache invalidation when ops finish or group batch ends.

765-808: Internal-scatter tie-break is hard to follow; consider extracting a comparator.

The compound condition

if storeCount < minCount ||
    (internalScatter && storeCount == minCount &&
        (storeRegionCount < bestStoreRegionCount ||
            (storeRegionCount == bestStoreRegionCount && (newPeer == nil || store.GetID() < newPeer.GetStoreId())))) {

mixes "strictly better" with three internal-only tie-breakers, making the intent (lower group count → fewer total regions → smaller store ID, deterministic) hard to verify at a glance. A small helper (e.g., betterCandidate(internalScatter, storeCount, minCount, storeRegionCount, bestStoreRegionCount, store, newPeer) bool) or a tuple comparison would make both the admin and internal paths self-documenting and easier to extend if a future tiebreaker (e.g., score) is added.

Also note that bestStoreRegionCount is logically meaningful only on the internal-scatter path, but it's being maintained on the admin path too. Splitting the admin/internal selection logic would remove that incidental coupling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/scatter/region_scatterer.go` around lines 765 - 808, The
selection condition in the loop inside (region_scatterer.go) is doing two
concerns at once (admin vs internal tie-breakers) and should be extracted into a
clear comparator; implement a helper like betterCandidate(internalScatter bool,
storeCount, minCount uint64, storeRegionCount, bestStoreRegionCount int, store
metapb.Store, newPeer *metapb.Peer) bool that returns true when the current
store is strictly a better candidate (first by lower storeCount, then, only when
internalScatter and storeCount==minCount, by lower storeRegionCount, then by
smaller store ID), replace the compound if with a call to that helper, and
ensure bestStoreRegionCount is only read/updated in the internalScatter path so
admin selection does not depend on region counts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/schedule/checker/split_scatter_group.go`:
- Around line 54-63: Add a brief clarifying comment immediately before the
conditional that compares region.GetEndKey() to indexRange.endKey stating that
both endKey (from region.GetEndKey()) and indexRange.startKey/indexRange.endKey
(from codec.EncodeBytes()) are MemComparable-encoded, so the lexicographic
bytes.Compare is valid; leave the logic unchanged (the if using bytes.Compare
and the call to splitScatterPrefixRange(tablePrefix) should remain as-is).

In `@pkg/schedule/checker/split_scatter.go`:
- Around line 167-212: dispatchSplitScatterRegions currently retries pending
entries without backoff on two failure paths (ScatterInternal error and
opController.AddOperator false), causing busy-retry/log spam; ensure every
non-success path calls delayPendingSplitScatter with splitScatterRetryBackoff
before continuing. Specifically, inside dispatchSplitScatterRegions handle the
error return from c.regionScatterer.ScatterInternal by calling
c.delayPendingSplitScatter(pending.regionID, splitScatterRetryBackoff) before
the log/continue, and likewise when c.opController.AddOperator(op) returns false
call c.delayPendingSplitScatter(pending.regionID, splitScatterRetryBackoff)
before logging and continue; keep existing pending deletion logic unchanged for
success paths.
- Around line 78-127: collectTopPendingSplitScatter currently selects candidates
from the map c.pending in randomized map order and truncates to limit, so add a
clear TODO referencing issue `#10592` to indicate priority ordering is
intentionally deferred; specifically, inside collectTopPendingSplitScatter after
building the candidates slice (before truncating or returning) insert a TODO
comment mentioning `#10592` and that candidates should be sorted by the region
priority score (CPU/byte traffic) or stored in a heap for bounded priority
behavior (or implement sorting of candidates by score before truncation if you
choose to implement now), referencing symbols candidates, pending, and
splitScatterDispatchLimit to guide where to change.

In `@pkg/schedule/scatter/region_scatterer_test.go`:
- Around line 1330-1335: The test currently allows op to be nil and only checks
its descriptor conditionally, which can hide regressions in ScatterInternal;
change the assertion to require op to be non-nil (use re.NotNil(op)) immediately
after re.NoError(err) and then assert re.Equal(InternalScatterOperatorDesc,
op.Desc()); if a (nil, nil) result is actually acceptable in some setups (e.g.,
due to isSameDistribution), instead make the test pre-arrange the region/cluster
state so a scatter is required (or assert the scatter counter is incremented)
before relaxing the NotNil requirement for ScatterInternal in
region_scatterer_test.go.

In `@server/cluster/split_scatter_test.go`:
- Around line 137-141: The test currently relies on a 50ms sleep inside
dispatchSplitScatterInPatrol to hope the patrol run completes, which is flaky;
modify the test to wait deterministically for at least one patrol pending-check
pass before asserting no operator was created by polling a visible signal such
as cluster.GetPatrolRegionsDuration() (or a newly exposed patrol run counter)
until it advances, or loop with a short timeout checking that
GetPatrolRegionsDuration() has increased, then call
cluster.GetOperatorController().GetOperator(splitRegionID) and assert nil; keep
references to dispatchSplitScatterInPatrol, GetPatrolRegionsDuration (or the new
counter), GetOperatorController().GetOperator, and splitRegionID so the change
is localized and deterministic.

---

Nitpick comments:
In `@pkg/mcs/scheduling/server/grpc_service.go`:
- Around line 443-445: The TODO about load-based split-scatter in
grpc_service.go should be linked to a tracking issue: create a short follow-up
issue describing the gap (recording split-scatter batches and plumbing
SplitReason for scheduling-service/NEXT_GEN) and then update the TODO comment
near the existing "TODO(split-scatter)" (and the matching TODO in
server/grpc_service.go) to include the issue number/URL (e.g.
"TODO(split-scatter): tracking: ISSUE-1234") so future readers can find the
follow-up; ensure the comment mentions the feature ("load-based split-scatter"),
the affected path ("scheduling-service/NEXT_GEN"), and the intended work
("record split-scatter batches and plumb SplitReason").

In `@pkg/schedule/checker/checker_controller.go`:
- Around line 171-174: The current measure call around
c.splitScatter.dispatchSplitScatterRegions() and
c.checkPendingProcessedRegions() mixes two phases into
c.metrics.patrolPhaseHistograms[phaseCheckPending]; separate them so each phase
is timed independently: call measure(...) once around
c.splitScatter.dispatchSplitScatterRegions() (using a new histogram key like
phaseDispatchSplitScatter or another appropriate histogram in
c.metrics.patrolPhaseHistograms) and keep a separate measure(...) around
c.checkPendingProcessedRegions() that continues to use phaseCheckPending; update
any histogram key enums/constants to include the new phase name and ensure both
measure invocations reference the correct histogram entries.

In `@pkg/schedule/checker/split_scatter.go`:
- Around line 149-165: The function recordSplitScatterBatch currently calls
c.cluster.GetRegion while holding c.pendingMu which creates a cross-lock
acquisition; to fix, call c.cluster.GetRegion(sourceRegionID) and compute
sourcePending.waitVersion (using sourceRegion.GetRegionEpoch().GetVersion()+1 if
non-nil) before acquiring c.pendingMu, then lock c.pendingMu and populate
c.pending with splitScatterPendingItem entries (including the precomputed
sourcePending) and set expireAt; keep all other logic (makeSplitScatterGroup,
splitScatterPendingTTL) the same and only move the cluster read out of the
critical section.
- Around line 33-37: Replace the package-level constants
splitScatterDispatchLimit, splitScatterRetryBackoff, and splitScatterPendingTTL
with configurable fields supplied by the existing config.CheckerConfigProvider
(or add a small SplitScatter sub-config on that provider); update
split_scatter.go to read defaults from the provider (using existing sensible
defaults equal to the current constants) and reference the provider values
wherever splitScatterDispatchLimit, splitScatterRetryBackoff, and
splitScatterPendingTTL were used (e.g., in the dispatch loop, retry logic, and
TTL checks). Add accessor methods or struct fields on CheckerConfigProvider (or
a new SplitScatterConfig struct) so callers can obtain int dispatchLimit,
time.Duration retryBackoff, and time.Duration pendingTTL; ensure unit tests or
call sites are updated to use the provider-based values and preserve current
behavior when the provider returns zero/nil by falling back to the previous
constants as defaults.

In `@pkg/schedule/scatter/region_scatterer.go`:
- Around line 374-400: The hot path in newInternalScatterState currently scans
all ops via opController.GetOperators() in applyRunningScatterOpsDelta
(filtering by InternalScatterOperatorDesc and group) and calls
r.cluster.GetRegion for each match, causing O(N * total_running_ops) cost; fix
by avoiding a full scan — either cache the built scatterState per group for the
lifetime of a scatter batch and only apply deltas for newly added ops, or
maintain an index of internal-scatter operators keyed by group (in the
controller or a lightweight map inside RegionScatterer) so
applyRunningScatterOpsDelta can iterate only relevant ops; update
newInternalScatterState to use the group-cache or indexed lookup and change
applyRunningScatterOpsDelta to iterate the filtered set (still using
finalPlacementAfterOperator and r.cluster.GetRegion for matched ops) and ensure
cache invalidation when ops finish or group batch ends.
- Around line 765-808: The selection condition in the loop inside
(region_scatterer.go) is doing two concerns at once (admin vs internal
tie-breakers) and should be extracted into a clear comparator; implement a
helper like betterCandidate(internalScatter bool, storeCount, minCount uint64,
storeRegionCount, bestStoreRegionCount int, store metapb.Store, newPeer
*metapb.Peer) bool that returns true when the current store is strictly a better
candidate (first by lower storeCount, then, only when internalScatter and
storeCount==minCount, by lower storeRegionCount, then by smaller store ID),
replace the compound if with a call to that helper, and ensure
bestStoreRegionCount is only read/updated in the internalScatter path so admin
selection does not depend on region counts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ee393cde-82b5-49fb-a777-a32aa1f457f5

📥 Commits

Reviewing files that changed from the base of the PR and between ee0fa30 and 1d54bda.

📒 Files selected for processing (10)
  • pkg/mcs/scheduling/server/grpc_service.go
  • pkg/schedule/checker/checker_controller.go
  • pkg/schedule/checker/split_scatter.go
  • pkg/schedule/checker/split_scatter_group.go
  • pkg/schedule/checker/split_scatter_test.go
  • pkg/schedule/scatter/region_scatterer.go
  • pkg/schedule/scatter/region_scatterer_test.go
  • server/cluster/cluster_worker.go
  • server/cluster/split_scatter_test.go
  • server/grpc_service.go

Comment thread pkg/schedule/checker/split_scatter_group.go
Comment thread pkg/schedule/checker/split_scatter.go
Comment thread pkg/schedule/checker/split_scatter.go
Comment thread pkg/schedule/scatter/region_scatterer_test.go
Comment thread server/cluster/split_scatter_test.go Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 82.45283% with 93 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.02%. Comparing base (b21a183) to head (cc4a132).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10621      +/-   ##
==========================================
+ Coverage   78.96%   79.02%   +0.05%     
==========================================
  Files         532      534       +2     
  Lines       71883    72443     +560     
==========================================
+ Hits        56766    57251     +485     
- Misses      11093    11144      +51     
- Partials     4024     4048      +24     
Flag Coverage Δ
unittests 79.02% <82.45%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024 lhy1024 marked this pull request as ready for review April 28, 2026 08:42
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2026
@lhy1024 lhy1024 changed the title checker: scatter recently split regions in PD checker, scatter, server: add load-based split-scatter with range-aware group baseline Apr 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/schedule/scatter/region_scatterer_test.go (1)

521-573: ⚠️ Potential issue | 🟠 Major

Register per-case cleanup immediately after enabling scatterFail and creating the stream/context.

If any require fails after Line 543, the failpoint stays enabled and the cleanup at Lines 572-573 never runs. That can contaminate later tests in this package and leak goroutines/streams. Wrap each case in a subtest/closure with deferred cleanup, or register cleanup right after allocation/enabling.

As per coding guidelines, **/*_test.go: Tests must handle failpoints using github.com/pingcap/failpoint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/scatter/region_scatterer_test.go` around lines 521 - 573, After
creating ctx/cancel and stream and immediately after enabling the
"github.com/tikv/pd/pkg/schedule/scatter/scatterFail" failpoint for a test case,
register per-case deferred cleanup so resources and failpoints are always
cleared: i.e., right after ctx, cancel, stream := ... add defer stream.Close()
and defer cancel(), and when you call failpoint.Enable(".../scatterFail", ...)
add a matching deferred failpoint.Disable(".../scatterFail") (only when enable
succeeded) so the failpoint is disabled even if requires/asserts fail later;
refer to symbols ctx, cancel, stream, and the failpoint name
"github.com/tikv/pd/pkg/schedule/scatter/scatterFail" and ensure the disable
call runs before the loop continues.
♻️ Duplicate comments (1)
server/cluster/split_scatter_test.go (1)

100-115: ⚠️ Potential issue | 🟡 Minor

Derive the expected group from splitRegionID instead of hardcoding 1.

Line 115 bakes in the allocator’s current first ID, but the contract is split-scatter-{sourceRegionID}-{firstNewRegionID}. If another setup path consumes IDs earlier, this test fails even though the behavior is correct.

🛠️ Suggested fix
+	"fmt"
 	"strings"
 	"testing"
 	"time"
@@
-	re.Equal("split-scatter-100-1", opGroup)
+	re.Equal(fmt.Sprintf("split-scatter-100-%d", splitRegionID), opGroup)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/cluster/split_scatter_test.go` around lines 100 - 115, The test
hardcodes the expected operator group string ("split-scatter-100-1"); instead
build it using the computed splitRegionID so the test reads the actual
firstNewRegionID: create an expectedGroup string (e.g. via
fmt.Sprintf("split-scatter-100-%d", splitRegionID) or strconv.FormatUint) and
replace the literal in the re.Equal check that compares opGroup from
op.GetAdditionalInfo("group") to the hardcoded value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/schedule/checker/split_scatter.go`:
- Around line 131-140: The fix is to ensure delayPendingSplitScatter (and
similar delete/update handlers) only back off or remove a pending entry if it
still matches the same snapshot/batch that the patrol pass observed: capture an
identifier or pointer from collectTopPendingSplitScatter (e.g., the pending
entry's batchID or the pending struct pointer) and pass that expected identity
into delayPendingSplitScatter / delete handlers, then inside those functions
obtain the current c.pending[regionID] under c.pendingMu and compare the
identities before mutating; if they differ, do nothing. This prevents an older
patrol snapshot from overwriting a newer RecordSplitScatter() update for the
same region.

---

Outside diff comments:
In `@pkg/schedule/scatter/region_scatterer_test.go`:
- Around line 521-573: After creating ctx/cancel and stream and immediately
after enabling the "github.com/tikv/pd/pkg/schedule/scatter/scatterFail"
failpoint for a test case, register per-case deferred cleanup so resources and
failpoints are always cleared: i.e., right after ctx, cancel, stream := ... add
defer stream.Close() and defer cancel(), and when you call
failpoint.Enable(".../scatterFail", ...) add a matching deferred
failpoint.Disable(".../scatterFail") (only when enable succeeded) so the
failpoint is disabled even if requires/asserts fail later; refer to symbols ctx,
cancel, stream, and the failpoint name
"github.com/tikv/pd/pkg/schedule/scatter/scatterFail" and ensure the disable
call runs before the loop continues.

---

Duplicate comments:
In `@server/cluster/split_scatter_test.go`:
- Around line 100-115: The test hardcodes the expected operator group string
("split-scatter-100-1"); instead build it using the computed splitRegionID so
the test reads the actual firstNewRegionID: create an expectedGroup string (e.g.
via fmt.Sprintf("split-scatter-100-%d", splitRegionID) or strconv.FormatUint)
and replace the literal in the re.Equal check that compares opGroup from
op.GetAdditionalInfo("group") to the hardcoded value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bd186a34-515c-4844-89b8-498982b1ec93

📥 Commits

Reviewing files that changed from the base of the PR and between 1d54bda and e33216b.

📒 Files selected for processing (7)
  • pkg/schedule/checker/checker_controller.go
  • pkg/schedule/checker/metrics.go
  • pkg/schedule/checker/split_scatter.go
  • pkg/schedule/checker/split_scatter_group.go
  • pkg/schedule/checker/split_scatter_test.go
  • pkg/schedule/scatter/region_scatterer_test.go
  • server/cluster/split_scatter_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/schedule/checker/checker_controller.go

Comment thread pkg/schedule/checker/split_scatter.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/cluster/split_scatter_test.go`:
- Around line 59-80: The test stops patrol as soon as one split child gets an
operator, allowing the other child to be never dispatched; change the wait
condition so patrol is only cancelled when both children have operators: update
the predicate passed to dispatchSplitScatterInPatrol to return true only when
cluster.GetOperatorController().GetOperator(splitRegionIDs[0]) != nil AND
cluster.GetOperatorController().GetOperator(splitRegionIDs[1]) != nil (or
alternatively assert both operators are non-nil after patrol by removing the nil
skip and failing the test if either op is nil); adjust references to
splitRegionIDs and GetOperatorController/GetOperator accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fbcf49af-d825-4076-b949-a77d2b3c1598

📥 Commits

Reviewing files that changed from the base of the PR and between e33216b and 549ef4e.

📒 Files selected for processing (4)
  • pkg/schedule/checker/split_scatter.go
  • pkg/schedule/checker/split_scatter_test.go
  • pkg/schedule/scatter/region_scatterer_test.go
  • server/cluster/split_scatter_test.go

Comment thread server/cluster/split_scatter_test.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/schedule/checker/split_scatter_test.go (1)

302-347: Consider de-duplicating region metadata construction in test helpers.

putSplitScatterRegionWithStores and putSplitScatterRegionWithoutLeader repeat the same region payload assembly; extracting one builder helper would reduce drift risk in future test updates.

♻️ Optional refactor sketch
+func newSplitScatterRegionMeta(regionID uint64, startKey, endKey string, peers []*metapb.Peer) *metapb.Region {
+	return &metapb.Region{
+		Id:       regionID,
+		StartKey: []byte(startKey),
+		EndKey:   []byte(endKey),
+		Peers:    peers,
+		RegionEpoch: &metapb.RegionEpoch{
+			ConfVer: 1,
+			Version: 1,
+		},
+	}
+}
+
 func putSplitScatterRegionWithStores(tc *mockcluster.Cluster, regionID uint64, startKey, endKey string, cpu uint64, stores ...uint64) {
 	peers := make([]*metapb.Peer, 0, len(stores))
 	for i, storeID := range stores {
 		peers = append(peers, &metapb.Peer{
 			Id:      regionID*10 + uint64(i) + 1,
 			StoreId: storeID,
 		})
 	}
-	region := &metapb.Region{
-		Id:       regionID,
-		StartKey: []byte(startKey),
-		EndKey:   []byte(endKey),
-		Peers:    peers,
-		RegionEpoch: &metapb.RegionEpoch{
-			ConfVer: 1,
-			Version: 1,
-		},
-	}
+	region := newSplitScatterRegionMeta(regionID, startKey, endKey, peers)
 	tc.PutRegion(core.NewRegionInfo(
 		region,
 		peers[0],
 		core.SetCPUUsage(cpu),
 	))
 }
 
 func putSplitScatterRegionWithoutLeader(tc *mockcluster.Cluster, regionID uint64, startKey, endKey string, cpu uint64) {
 	peers := []*metapb.Peer{
 		{Id: regionID*10 + 1, StoreId: 1},
 		{Id: regionID*10 + 2, StoreId: 2},
 		{Id: regionID*10 + 3, StoreId: 3},
 	}
-	region := &metapb.Region{
-		Id:       regionID,
-		StartKey: []byte(startKey),
-		EndKey:   []byte(endKey),
-		Peers:    peers,
-		RegionEpoch: &metapb.RegionEpoch{
-			ConfVer: 1,
-			Version: 1,
-		},
-	}
+	region := newSplitScatterRegionMeta(regionID, startKey, endKey, peers)
 	tc.PutRegion(core.NewRegionInfo(
 		region,
 		nil,
 		core.SetCPUUsage(cpu),
 	))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/checker/split_scatter_test.go` around lines 302 - 347, Both
putSplitScatterRegionWithStores and putSplitScatterRegionWithoutLeader duplicate
region/RegionInfo assembly; extract a shared builder helper (e.g.,
buildSplitScatterRegion or newRegionInfoForTest) that accepts regionID,
startKey, endKey, peers []*metapb.Peer, leader *metapb.Peer, and cpu uint64,
constructs the metapb.Region and returns core.NewRegionInfo(...,
core.SetCPUUsage(cpu)); then refactor putSplitScatterRegionWithStores to
generate peers from stores (keeping the Id/StoreId scheme) and call the new
helper with peers and leader=peers[0], and refactor
putSplitScatterRegionWithoutLeader to call the helper with the fixed peers and
leader=nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/schedule/checker/split_scatter_test.go`:
- Around line 302-347: Both putSplitScatterRegionWithStores and
putSplitScatterRegionWithoutLeader duplicate region/RegionInfo assembly; extract
a shared builder helper (e.g., buildSplitScatterRegion or newRegionInfoForTest)
that accepts regionID, startKey, endKey, peers []*metapb.Peer, leader
*metapb.Peer, and cpu uint64, constructs the metapb.Region and returns
core.NewRegionInfo(..., core.SetCPUUsage(cpu)); then refactor
putSplitScatterRegionWithStores to generate peers from stores (keeping the
Id/StoreId scheme) and call the new helper with peers and leader=peers[0], and
refactor putSplitScatterRegionWithoutLeader to call the helper with the fixed
peers and leader=nil.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8af34fc2-4622-4d5c-ae65-91d667457bca

📥 Commits

Reviewing files that changed from the base of the PR and between 549ef4e and 365f3eb.

📒 Files selected for processing (1)
  • pkg/schedule/checker/split_scatter_test.go

Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Comment thread server/grpc_service.go
}
cli := forwardCli.getClient()
if cli != nil {
// TODO(split-scatter): propagate split reason and support load-based split-scatter
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes load-split scatter behave differently between monolithic PD and scheduling-service/NEXT_GEN deployments. If that gap is intentional for this PR, can we call it out with a feature gate or release note so operators are not surprised?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to add it. The reason I didn't add it before was because I was worried that this PR would be too big since it was intended as a patch.

now := time.Now()
c.pendingMu.RLock()

candidates := make([]splitScatterPendingItem, 0, len(c.pending))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs on every patrol tick, but it scans the whole pending map before taking only 4 items. Under a load-split burst, that can become steady O(pending) work for up to the 3m TTL. Can we use a bounded queue keyed by retry/expire/priority, or at least cap the pending size?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the initial version, I added a priority queue sorted by CPU, but I removed it for reasons similar to avoiding excessively high pull requests. I will adopt a simpler implementation method.

Comment thread pkg/schedule/checker/split_scatter.go Outdated
return nil, nil
}
op, err := operator.CreateScatterRegionOperator(scatterOperatorDesc, r.cluster, region, targetPeers, targetLeader, skipStoreLimit)
op, err := operator.CreateScatterRegionOperator(desc, r.cluster, region, targetPeers, targetLeader, skipStoreLimit)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This internal path still creates an OpAdmin scatter operator with high priority. Since it is now triggered automatically by load splits, it may compete with regular hot/balance scheduling more aggressively than expected. Should this use a lower priority or be guarded by a config switch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal scatter is a transient, one-off operation, so I think it's reasonable to give it higher priority?

lhy1024 added 3 commits April 29, 2026 16:46
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Comment thread pkg/schedule/checker/split_scatter.go
lhy1024 added 2 commits April 29, 2026 17:45
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Comment thread server/grpc_service.go
}
cli := forwardCli.getClient()
if cli != nil {
// TODO(split-scatter): propagate split reason and support load-based split-scatter
Copy link
Copy Markdown
Member

@JmPotato JmPotato Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is not enough to cover the user-visible deployment-mode difference. Please also state in the PR body / release note that load-split auto scatter only works on monolithic PD, and NEXT_GEN/scheduling-service is not supported yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

return nil, nil
}
op, err := operator.CreateScatterRegionOperator(scatterOperatorDesc, r.cluster, region, targetPeers, targetLeader, skipStoreLimit)
op, err := operator.CreateScatterRegionOperator(desc, r.cluster, region, targetPeers, targetLeader, skipStoreLimit)
Copy link
Copy Markdown
Member

@JmPotato JmPotato Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still creates an OpAdmin operator, but the trigger is an automatic background split-scatter flow. Please either use a non-admin kind, or add tests that pin down the kind / priority / store-limit / metrics semantics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal scatter operator: SchedulerKind=OpAdmin priority=High

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpAdmin is usually triggered by calling the API manually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpAdmin is usually triggered by calling the API manually.

I changed the internal split-scatter path to build a non-admin scatter operator whose kind is derived from the generated steps. The public/admin scatter path still uses OpAdmin.

candidates = append(candidates, pending)
}
if len(candidates) > limit {
candidates = candidates[:limit]
Copy link
Copy Markdown
Member

@JmPotato JmPotato Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending comes from a map, so truncating before sorting makes dispatch random. Please sort by (group, regionID) before truncation to make it deterministic and group-friendly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add sorting

Comment thread server/cluster/split_scatter_test.go Outdated
re.NoError(cluster.processRegionHeartbeat(core.ContextTODO(), newSplitScatterRegion(splitRegionIDs[1], []byte("t"), []byte(""), 80)))

dispatchSplitScatterInPatrol(t, cluster, cancelPatrol, func() bool {
return cluster.GetOperatorController().GetOperator(splitRegionIDs[0]) != nil ||
Copy link
Copy Markdown
Member

@JmPotato JmPotato Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stops once either child has an operator, and the later loop skips nil operators. The test can pass even if the other child is never scattered. Please wait for both children, or lower the test claim.

if !ok {
continue
}
if !pending.expireAt.IsZero() && !now.Before(pending.expireAt) {
Copy link
Copy Markdown
Member

@JmPotato JmPotato Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTL expiration is a silent skip here. Please add split_scatter_pending_expired_total so TTL/capacity-related missed scatters are observable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add counter

}
}
if len(c.pending)+newPendingCount > splitScatterPendingLimit {
log.Info("skip recording split scatter batch due to pending limit",
Copy link
Copy Markdown
Member

@JmPotato JmPotato Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping pending entries because of the cap should not be log-only. Please add split_scatter_pending_dropped_total, together with the expired counter, to expose the v1 capacity limit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Apr 30, 2026
lhy1024 added 5 commits April 30, 2026 11:44
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
(cherry picked from commit c706a13)
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
c.splitScatter.dispatchSplitScatterRegions()
}

func TestRecordSplitScatterBatchCollectsPendingRegions(t *testing.T) {
Copy link
Copy Markdown
Member

@rleungx rleungx Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test to cover that one region keeps splitting?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 30, 2026

@lhy1024: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-next-gen-3 cc4a132 link true /test pull-unit-test-next-gen-3

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants