-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections #61417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3b4e99e
to
e354a96
Compare
314606b
to
35038ec
Compare
Hi @kevincai master, are there any suggestion for this case? Or any comment for this PR? |
will take a look today. |
@duanyyyyyyy gentle ping =) |
|
35038ec
to
81b4f55
Compare
a3dbfc0
to
ae36e29
Compare
Seems some unrelated test failed, let me retry for it |
ae36e29
to
40fb8af
Compare
5299e30
to
0de6839
Compare
0de6839
to
5f76db7
Compare
@kevincai Master, any other comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a cleanup mechanism for BRPC stub caches to prevent accumulation of unused connections in Kubernetes environments where pod IPs change frequently. The implementation periodically cleans up expired BRPC stubs from three cache types.
Key Changes:
- Introduces timer-based cleanup tasks for BRPC stub expiration
- Adds configurable expiration time (
brpc_stub_expire_s
) with 60-minute default - Refactors cache data structures to support scheduled cleanup
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
be/src/util/brpc_stub_cache.h |
Adds cleanup task classes and timer support to cache headers |
be/src/util/brpc_stub_cache.cpp |
Implements timer-based cleanup logic for all three BRPC cache types |
be/src/common/config.h |
Adds configurable expiration time parameter |
be/src/runtime/exec_env.cpp |
Updates BrpcStubCache constructor to accept ExecEnv parameter |
be/test/util/brpc_stub_cache_test.cpp |
Adds comprehensive tests for cleanup functionality |
docs/en/administration/management/BE_configuration.md |
Documents new configuration parameter |
be/test/http/stream_load_test.cpp |
Updates test to use new constructor signature |
be/test/http/transaction_stream_load_test.cpp |
Updates test to use new constructor signature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me.
781cb5f
to
df5464f
Compare
…ctions Signed-off-by: duanyyyyyyy <[email protected]>
…ctions Signed-off-by: duanyyyyyyy <[email protected]>
…ctions Signed-off-by: duanyyyyyyy <[email protected]>
…ctions Signed-off-by: duanyyyyyyy <[email protected]>
…ctions Signed-off-by: duanyyyyyyy <[email protected]>
df5464f
to
0c7c17d
Compare
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 95 / 101 (94.06%) file detail
|
@stdpain Master, any comments? |
[Enhancement] Add a cleaner for BrpcStubCache to cleanup unused connections
Why I'm doing:
The BRPC Stub cache will never be evicted and will always stored in the cache.


Therefore there will be some bug here.
In the case that when users are using K8s to deploy the StarRocks backend, after sometime running there will be many log like this.
After check the BRPC code in https://github.com/apache/brpc/blob/1.8.0/src/brpc/socket.cpp#L1340
And from the grafana monitor
This cluster have 148 BE nodes, normally it will be about 148 BRPC stub but after some times running, some pods restarted and the IP changed, some of the node BRPC stub creased to over 200.
And as we all known the endpoint in BRPC are made of ip and host, that means if the IP is changed the endpoint will still in the cache.
What I'm doing:
Here I introduce a BrpcStubManager to periodically check and cleanup expired BRPC stub for BrpcStubCache, HttpBrpcStubCache and LakeServiceBrpcStubCache
_stub_map
or_last_access_time_map
check if now - last_access_time is larger than expire time or notWhat type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: