-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add TTL values for client caches of key locations and auto remove expired entries #12514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
So the client may automatically remove stale entries, e.g., storage server has since changed IP addresses.
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Doesn't seem to be effective in simulation runs. will go back to this.
saintstack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One question in below.
|
|
||
| clientDBInfoMonitor = monitorClientDBInfoChange(this, clientInfo, &proxiesChangeTrigger); | ||
| tssMismatchHandler = handleTssMismatches(this); | ||
| locationCacheCleanup = cleanupLocationCache(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a cancel() call for this actor in the destructor (e.g. https://github.com/apple/foundationdb/blob/main/fdbclient/DatabaseContext.actor.cpp#L1573)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to explicitly call cancel(), because locationCacheCleanup destructor will automatically cancel the actor. Usually explictly calling cancel is to proactively clean the state to avoid destruction order problems. To be on the safe side, I'll add the cancel call in the destructor.
| using Locations = MultiInterface<ReferencedInterface<StorageServerInterface>>; | ||
| explicit LocationInfo(const std::vector<Reference<ReferencedInterface<StorageServerInterface>>>& v) | ||
| : Locations(v) {} | ||
| : Locations(v), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I'm wrong, but do I understand the code correctly that the LocationInfo holds information of one or more StorageServerInterface and the expire time is set for the LocationInfo and not the information of the individual StorageServerInterface in this LocationInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct. Each shard is stored on multiple storage servers, so LocationInfo points to all replicas.
This seems to be a field added for StorageCache feature, which was removed.
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Currently, client side cache only remove entries when the cache size limit is reached. However, if a storage server has changed IP, but the client is not accessing the server, the old entry in the cache will never be removed. Thus the client will keep try connecting to the old IP addresses (maybe until some timeout value?).
The change here is to add TTL for cached entries and remove old entries if they were never renewed/used recently.
20251024-210832-jzhou-96ffca1323c55610 compressed=True data_size=37340840 duration=4572667 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=2:24:43 sanity=False started=100000 stopped=20251024-233315 submitted=20251024-210832 timeout=5400 username=jzhou
20251120-192454-jzhou-1c85ca93ab32246b compressed=True data_size=37340794 duration=5168185 ended=100000 fail=5 fail_fast=10 max_runs=100000 pass=99995 priority=100 remaining=0 runtime=2:03:10 sanity=False started=100000 stopped=20251120-212804 submitted=20251120-192454 timeout=5400 username=jzhou
Only saw 1 failure in the log, which is
TestUnexpectedlyNotFinishedbut reproduction passed, likely due tossd-sharded-rocksdbbeing slow.Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branchormainif this is the youngest branch)