Skip to content

client: use singleflight to reduce request press#10632

Open
bufferflies wants to merge 3 commits intotikv:masterfrom
bufferflies:client/singleflight_update
Open

client: use singleflight to reduce request press#10632
bufferflies wants to merge 3 commits intotikv:masterfrom
bufferflies:client/singleflight_update

Conversation

@bufferflies
Copy link
Copy Markdown
Contributor

@bufferflies bufferflies commented Apr 29, 2026

What problem does this PR solve?

Issue Number: Close #10633

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Summary by CodeRabbit

  • Bug Fixes

    • Reduced duplicate backend requests by deduplicating concurrent calls, improving responsiveness and stability under high concurrency.
  • Chores

    • Updated client dependency set to include a newer sync library version for continued compatibility and maintenance.

Signed-off-by: tongjian <1045931706@qq.com>
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 29, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3b972e6b-474f-4b2c-b9d6-97d64d41bf19

📥 Commits

Reviewing files that changed from the base of the PR and between 9bd3fed and 5aa61b8.

📒 Files selected for processing (1)
  • client/servicediscovery/service_discovery.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • client/servicediscovery/service_discovery.go

📝 Walkthrough

Walkthrough

Adds golang.org/x/sync to client dependencies and integrates a singleflight.Group into service discovery to deduplicate concurrent GetClusterInfo and GetMembers gRPC requests, preventing duplicate remote calls under concurrent access.

Changes

Cohort / File(s) Summary
Dependency Management
client/go.mod
Adds direct requirement golang.org/x/sync v0.19.0.
Request Deduplication
client/servicediscovery/service_discovery.go
Introduces a singleflight.Group field and wraps GetClusterInfo and GetMembers RPC invocations with c.flight.Do(...) keyed by operation+URL; shared results are type-asserted and existing error/metrics handling is preserved.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant SD as ServiceDiscovery
    participant SF as singleflight.Group
    participant PD as PD gRPC

    Caller->>SD: Request GetClusterInfo (concurrent)
    SD->>SF: Do("GetClusterInfo:<url>", fn)
    alt First caller
        SF->>PD: Invoke GetClusterInfo RPC
        PD-->>SF: Response
        SF-->>SD: result (shared)
        SD-->>Caller: Parsed response
    else Concurrent callers
        SF-->>SD: Wait for shared result
        SD-->>Caller: Parsed response (from shared result)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped a race of calls tonight,
One singleflight made them all unite,
No extra hops across the wire,
Shared the answer, snug and light,
Hooray — fewer echoes, higher delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description includes the issue reference (Close #10633), but lacks a commit message and does not clearly articulate what is changed or how it works beyond the template structure. Add a detailed commit message explaining the singleflight implementation and how it deduplicates concurrent calls. Describe the specific functions modified (getClusterInfo, getMembers) and the deduplication mechanism.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'client: use singleflight to reduce request press' clearly describes the main change: adding singleflight to the client to reduce request pressure. It matches the changeset scope and primary objective.
Linked Issues check ✅ Passed The code changes implement singleflight deduplication in getClusterInfo and getMembers functions to reduce concurrent redundant calls, directly addressing the requirement from issue #10633.
Out of Scope Changes check ✅ Passed The changes are scoped to adding the singleflight dependency and implementing deduplication in service_discovery.go. No unrelated or out-of-scope modifications are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/servicediscovery/service_discovery.go`:
- Around line 448-449: singleflight usage currently collapses calls by verb only
(flight.Do("GetClusterInfo") / flight.Do("GetMembers")), causing result leaks
across PD URLs and losing caller deadlines; update the calls in getMembers and
GetClusterInfo to include the target URL (e.g., key := fmt.Sprintf("%s:%s",
verb, url)) or better, replace Do with DoChan and wait on both the DoChan result
and the caller's ctx.Done() so each caller preserves its own deadline and only
receives results for the requested URL; make these changes around the methods
getMembers, GetClusterInfo and their callers that loop over c.GetServiceURLs(),
and apply the same pattern at the other locations noted in the comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f7a0e2f-a5a7-4a42-8a41-e9e4781188ab

📥 Commits

Reviewing files that changed from the base of the PR and between f941a7e and bdd8fde.

⛔ Files ignored due to path filters (1)
  • client/go.sum is excluded by !**/*.sum
📒 Files selected for processing (2)
  • client/go.mod
  • client/servicediscovery/service_discovery.go

Comment thread client/servicediscovery/service_discovery.go
@bufferflies
Copy link
Copy Markdown
Contributor Author

/retest

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.02%. Comparing base (b21a183) to head (bdd8fde).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10632      +/-   ##
==========================================
+ Coverage   78.96%   79.02%   +0.05%     
==========================================
  Files         532      532              
  Lines       71883    71969      +86     
==========================================
+ Hits        56766    56875     +109     
+ Misses      11093    11087       -6     
+ Partials     4024     4007      -17     
Flag Coverage Δ
unittests 79.02% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread client/servicediscovery/service_discovery.go Outdated
start := time.Now()
defer func() { metrics.InternalCmdDurationGetMembers.Observe(time.Since(start).Seconds()) }()
members, err := pdpb.NewPDClient(cc).GetMembers(ctx, &pdpb.GetMembersRequest{})
res, err, _ := c.flight.Do("GetMembers", func() (interface{}, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: please include the target URL in the singleflight key. Otherwise a GetMembers call for one PD endpoint can share the response or error from another endpoint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, add url into the key

Signed-off-by: tongjian <1045931706@qq.com>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from okjiang. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
client/servicediscovery/service_discovery.go (1)

926-928: ⚠️ Potential issue | 🟠 Major

Preserve each caller’s timeout while deduplicating requests.

Line 926 and Line 952 use singleflight.Group.Do, so duplicate callers block until the first call finishes and cannot honor their own ctx.Done() while waiting. This can exceed caller timeout budgets during contention.

Suggested fix (keep constant keys, switch to DoChan + caller-context select)
@@
-	res, err, _ := c.flight.Do("GetClusterInfo", func() (any, error) {
-		return pdpb.NewPDClient(cc).GetClusterInfo(ctx, &pdpb.GetClusterInfoRequest{})
-	})
-	if err != nil {
+	ch := c.flight.DoChan("GetClusterInfo", func() (any, error) {
+		return pdpb.NewPDClient(cc).GetClusterInfo(ctx, &pdpb.GetClusterInfoRequest{})
+	})
+	var r singleflight.Result
+	select {
+	case <-ctx.Done():
+		metrics.InternalCmdFailedDurationGetClusterInfo.Observe(time.Since(start).Seconds())
+		attachErr := errors.Errorf("error:%s target:%s status:%s", ctx.Err(), cc.Target(), cc.GetState().String())
+		return nil, errs.ErrClientGetClusterInfo.Wrap(attachErr).GenWithStackByCause()
+	case r = <-ch:
+	}
+	if r.Err != nil {
 		metrics.InternalCmdFailedDurationGetClusterInfo.Observe(time.Since(start).Seconds())
-		attachErr := errors.Errorf("error:%s target:%s status:%s", err, cc.Target(), cc.GetState().String())
+		attachErr := errors.Errorf("error:%s target:%s status:%s", r.Err, cc.Target(), cc.GetState().String())
 		return nil, errs.ErrClientGetClusterInfo.Wrap(attachErr).GenWithStackByCause()
 	}
-	clusterInfo := res.(*pdpb.GetClusterInfoResponse)
+	clusterInfo := r.Val.(*pdpb.GetClusterInfoResponse)
@@
-	res, err, _ := c.flight.Do("GetMembers", func() (any, error) {
-		return pdpb.NewPDClient(cc).GetMembers(ctx, &pdpb.GetMembersRequest{})
-	})
-	if err != nil {
+	ch := c.flight.DoChan("GetMembers", func() (any, error) {
+		return pdpb.NewPDClient(cc).GetMembers(ctx, &pdpb.GetMembersRequest{})
+	})
+	var r singleflight.Result
+	select {
+	case <-ctx.Done():
+		metrics.InternalCmdFailedDurationGetMembers.Observe(time.Since(start).Seconds())
+		attachErr := errors.Errorf("error:%s target:%s status:%s", ctx.Err(), cc.Target(), cc.GetState().String())
+		return nil, errs.ErrClientGetMember.Wrap(attachErr).GenWithStackByCause()
+	case r = <-ch:
+	}
+	if r.Err != nil {
 		metrics.InternalCmdFailedDurationGetMembers.Observe(time.Since(start).Seconds())
-		attachErr := errors.Errorf("error:%s target:%s status:%s", err, cc.Target(), cc.GetState().String())
+		attachErr := errors.Errorf("error:%s target:%s status:%s", r.Err, cc.Target(), cc.GetState().String())
 		return nil, errs.ErrClientGetMember.Wrap(attachErr).GenWithStackByCause()
 	}
-	members := res.(*pdpb.GetMembersResponse)
+	members := r.Val.(*pdpb.GetMembersResponse)
In golang.org/x/sync/singleflight (v0.19.0), does Group.Do allow duplicate callers to stop waiting when their context is canceled, or is DoChan+select on ctx.Done() required to preserve per-caller deadlines?

As per coding guidelines, "Use context-aware timeouts and backoff for retries" and "Prevent goroutine leaks: pair with cancellation; consider errgroup".

Also applies to: 952-954

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/servicediscovery/service_discovery.go` around lines 926 - 928, The
current use of c.flight.Do(...) (e.g., the GetClusterInfo call that wraps
pdpb.NewPDClient(cc).GetClusterInfo) blocks duplicate callers until the first
finishes and prevents honoring each caller’s ctx.Done(); replace Group.Do with
Group.DoChan and, after calling c.flight.DoChan("GetClusterInfo", func() (any,
error) { ... }), select between the result channel and the caller's ctx.Done()
so a canceled/deadlined caller returns promptly, and apply the same pattern to
the other occurrence (the call at the other duplicated key around lines
952–954); ensure the key strings remain constant and that you cancel/ignore the
result if ctx.Done() fires to avoid goroutine/resource leaks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@client/servicediscovery/service_discovery.go`:
- Around line 926-928: The current use of c.flight.Do(...) (e.g., the
GetClusterInfo call that wraps pdpb.NewPDClient(cc).GetClusterInfo) blocks
duplicate callers until the first finishes and prevents honoring each caller’s
ctx.Done(); replace Group.Do with Group.DoChan and, after calling
c.flight.DoChan("GetClusterInfo", func() (any, error) { ... }), select between
the result channel and the caller's ctx.Done() so a canceled/deadlined caller
returns promptly, and apply the same pattern to the other occurrence (the call
at the other duplicated key around lines 952–954); ensure the key strings remain
constant and that you cancel/ignore the result if ctx.Done() fires to avoid
goroutine/resource leaks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db1b73ef-f8b5-49de-9e6e-995d4013ed3f

📥 Commits

Reviewing files that changed from the base of the PR and between bdd8fde and 9bd3fed.

📒 Files selected for processing (1)
  • client/servicediscovery/service_discovery.go

Signed-off-by: tongjian <1045931706@qq.com>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 30, 2026

@bufferflies: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-next-gen-3 5aa61b8 link true /test pull-unit-test-next-gen-3

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

defer func() { metrics.InternalCmdDurationGetClusterInfo.Observe(time.Since(start).Seconds()) }()
clusterInfo, err := pdpb.NewPDClient(cc).GetClusterInfo(ctx, &pdpb.GetClusterInfoRequest{})
key := "GetClusterInfo-" + url
res, err, _ := c.flight.Do(key, func() (any, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

singleflight.Do still makes later callers wait for the first same-URL RPC, so their own timeout/cancel may not be respected.

Could we use DoChan + select ctx.Done() to preserve caller deadline semantics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

client: reduce getClusterInfo and getmember call on concurrent.

4 participants