fix(pseudo-peer): cache block hashes during header serving#120
Closed
AlliedToasters wants to merge 1 commit intohl-archive-node:node-builderfrom
Closed
fix(pseudo-peer): cache block hashes during header serving#120AlliedToasters wants to merge 1 commit intohl-archive-node:node-builderfrom
AlliedToasters wants to merge 1 commit intohl-archive-node:node-builderfrom
Conversation
… Bodies disconnect The GetBlockHeaders handler now caches hash→number mappings for every header it serves, and the blockhash LRU cache limit is increased from 1M to 15M entries. Previously, GetBlockBodies requests (which arrive by hash) frequently missed the cache and triggered slow backfill scans that blocked the single-threaded pseudo peer event loop. The main node's protocol breach timeout then disconnected the unresponsive peer every ~45 seconds. With this fix the Bodies stage completes without disconnects — tested on testnet syncing ~11M blocks in ~3 minutes. Also replaces the rate-limited public RPC fallback with a hard error, since all blocks should be available locally and the RPC fallback masked the underlying cache population issue. Fixes hl-archive-node#109
4 tasks
Collaborator
|
This was due to a different subtle bug in cache warming and I'll fix it in #122. Thanks for the contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GetBlockHeadersresponses so that subsequentGetBlockBodiesrequests (which arrive by hash) resolve instantly from the cachefallback_to_official_rpc) with a hard error, since the RPC fallback masked the underlying cache population issueProblem
During the Bodies stage, the main node sends
GetBlockBodiesrequests containing block hashes. The pseudo peer needs to resolve these hashes to block numbers to fetch the data from its block source. Previously, theGetBlockHeadershandler did not cache hash→number mappings, so the Bodies handler triggered slow backfill scans — fetching and hashing blocks sequentially to find the target hash. This blocked the single-threaded pseudo peer event loop for minutes at a time, causing the main node's protocol breach timeout (~120s) to disconnect the peer.The result was a connect/disconnect cycle every ~45 seconds during the Bodies stage, with the RPC fallback exhausting its daily rate limit and making backfills even slower.
Fix
The
GetBlockHeadershandler now computes and caches the hash for every header it serves. Since the main node always requests headers before bodies for a given block range, the cache is pre-populated by the timeGetBlockBodiesarrives. The LRU cache is also increased to 15M entries to avoid eviction across large chain ranges.Test plan
cargo checkpassesFixes #109
🤖 Generated with Claude Code