Skip to content

Conversation

duanyyyyyyy
Copy link
Contributor

@duanyyyyyyy duanyyyyyyy commented Jul 29, 2025

[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections

Why I'm doing:

The BRPC Stub cache will never be evicted and will always stored in the cache.
Therefore there will be some bug here.
In the case that when users are using K8s to deploy the StarRocks backend, after sometime running there will be many log like this.
image
After check the BRPC code in https://github.com/apache/brpc/blob/1.8.0/src/brpc/socket.cpp#L1340
And from the grafana monitor
image
This cluster have 148 BE nodes, normally it will be about 148 BRPC stub but after some times running, some pods restarted and the IP changed, some of the node BRPC stub creased to over 200.
And as we all known the endpoint in BRPC are made of ip and host, that means if the IP is changed the endpoint will still in the cache.

What I'm doing:

Here I introduce a BrpcStubManager to periodically check and cleanup expired BRPC stub for BrpcStubCache, HttpBrpcStubCache and LakeServiceBrpcStubCache

  1. Record the last access time when get the stub for given endpoint
  2. Loop all the endpoint in _stub_map or _last_access_time_map check if now - last_access_time is larger than expire time or not
  3. If less, will not clean up the endpoint
  4. If larger or equal, will try to clean up the endpoint

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

@duanyyyyyyy duanyyyyyyy requested a review from a team as a code owner July 29, 2025 16:44
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 3b4e99e to e354a96 Compare July 29, 2025 16:44
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 6 times, most recently from 314606b to 35038ec Compare July 30, 2025 17:24
@kevincai kevincai self-requested a review July 31, 2025 09:18
@duanyyyyyyy
Copy link
Contributor Author

Hi @kevincai master, are there any suggestion for this case? Or any comment for this PR?

@kevincai
Copy link
Contributor

kevincai commented Aug 1, 2025

Hi @kevincai master, are there any suggestion for this case? Or any comment for this PR?

will take a look today.

@github-actions github-actions bot added the 3.5 label Aug 1, 2025
@xhumanoid
Copy link
Contributor

@duanyyyyyyy gentle ping =)

@duanyyyyyyy duanyyyyyyy reopened this Aug 11, 2025
@duanyyyyyyy
Copy link
Contributor Author

@duanyyyyyyy gentle ping =)

sry for forget.

@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 35038ec to 81b4f55 Compare August 12, 2025 18:53
@duanyyyyyyy duanyyyyyyy requested a review from a team as a code owner August 12, 2025 18:53
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 5 times, most recently from a3dbfc0 to ae36e29 Compare August 13, 2025 00:54
@duanyyyyyyy
Copy link
Contributor Author

@duanyyyyyyy gentle ping =)

Seems some unrelated test failed, let me retry for it

@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from ae36e29 to 40fb8af Compare August 13, 2025 03:46
@github-actions github-actions bot added the 4.0 label Aug 13, 2025
@kevincai kevincai requested a review from Copilot August 13, 2025 08:13
Copilot

This comment was marked as outdated.

@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 5 times, most recently from 5299e30 to 0de6839 Compare September 3, 2025 10:13
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from 0de6839 to 5f76db7 Compare September 3, 2025 12:21
@duanyyyyyyy
Copy link
Contributor Author

@kevincai Master, any other comments?

@kevincai kevincai requested review from stdpain and Copilot September 4, 2025 13:22
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a cleanup mechanism for BRPC stub caches to prevent accumulation of unused connections in Kubernetes environments where pod IPs change frequently. The implementation periodically cleans up expired BRPC stubs from three cache types.

Key Changes:

  • Introduces timer-based cleanup tasks for BRPC stub expiration
  • Adds configurable expiration time (brpc_stub_expire_s) with 60-minute default
  • Refactors cache data structures to support scheduled cleanup

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
be/src/util/brpc_stub_cache.h Adds cleanup task classes and timer support to cache headers
be/src/util/brpc_stub_cache.cpp Implements timer-based cleanup logic for all three BRPC cache types
be/src/common/config.h Adds configurable expiration time parameter
be/src/runtime/exec_env.cpp Updates BrpcStubCache constructor to accept ExecEnv parameter
be/test/util/brpc_stub_cache_test.cpp Adds comprehensive tests for cleanup functionality
docs/en/administration/management/BE_configuration.md Documents new configuration parameter
be/test/http/stream_load_test.cpp Updates test to use new constructor signature
be/test/http/transaction_stream_load_test.cpp Updates test to use new constructor signature

Copy link
Contributor

@kevincai kevincai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good to me.

@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch 4 times, most recently from 781cb5f to df5464f Compare September 4, 2025 16:27
@duanyyyyyyy duanyyyyyyy force-pushed the add_cleaner_brpc_stub branch from df5464f to 0c7c17d Compare September 4, 2025 17:11
Copy link

github-actions bot commented Sep 4, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Sep 4, 2025

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Sep 4, 2025

[BE Incremental Coverage Report]

pass : 95 / 101 (94.06%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/util/brpc_stub_cache.cpp 92 98 93.88% [66, 182, 183, 194, 259, 270]
🔵 be/src/runtime/exec_env.cpp 1 1 100.00% []
🔵 be/src/util/brpc_stub_cache.h 2 2 100.00% []

@duanyyyyyyy
Copy link
Contributor Author

@stdpain Master, any comments?

@stdpain stdpain merged commit 11bb192 into StarRocks:main Sep 9, 2025
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants