Skip to content

chore: Various quality of life changes around CN <-> BN communication#25203

Merged
timfn-hg merged 7 commits intomainfrom
timfn/more-bn-comm-fixes
May 1, 2026
Merged

chore: Various quality of life changes around CN <-> BN communication#25203
timfn-hg merged 7 commits intomainfrom
timfn/more-bn-comm-fixes

Conversation

@timfn-hg
Copy link
Copy Markdown
Contributor

@timfn-hg timfn-hg commented Apr 30, 2026

Description:
This PR introduces several changes to improve various parts of the CN/BN communication layer. Most of the changes are related to logging and otherwise not very impactful. The full changes include:

  • Standardize logging formats.
  • Reduce connection state transition logging from INFO to DEBUG to reduce log volume.
  • Introduce a request counter that is scoped to a single streaming connection for better correlation of each request. Previously, only requests associated with a block would have a unique request number that is shared with the block node. This new "connection request number" is captured only in the consensus node and is not part of the correlation ID passed to the block node. In the logs, this new ID is appended to the existing correlation ID so it doesn't impact searching for common correlation IDs between the consensus node and block node.
  • The request latency metric is converted to microseconds instead of milliseconds. In many cases, sending a request to the block node may take less than a millisecond, so by capturing the latency in a more granular resolution it will make observability a little more useful.
  • Exhaustive error handling when sending a request to the block node. Previously, there was a scenario in which a request being sent could time out and if the connection transitioned to a terminal state, there would be no indication of the timeout. Furthermore, because there was no escape when this happened, later in the code when the request duration was calculated it would resolve to a very large negative number. This is the result of capturing the starting time, but not capturing the ending time and so the duration would be captured as something like 0 - 193083817, which is interpreted as a duration of negative hundreds of days.
  • Introduce a cool down when a block node is well outside the range of the consensus node. Previously, when we checked the server status API to determine what the wanted block was, if a block node was very far behind we would not put the node in a cool down period. This meant that if a higher priority block node was always behind, we would try to re-fetch the status every time the connection monitor was invoked. Now, we will place the node in a cool down if we determine the node is far behind. The duration of the cool down is based on how far behind the block node is: if the block node exceeds the "low" threshold, then it is placed in a basic cool down; if it exceeds the "high" threshold, then it is placed in the longer, extended cool down.
  • Introduces the streaming connection heartbeat updates in more places. If we are under high load and frequently have enough items to keep sending, the heartbeat may not get updated frequently enough and thus can trigger the stalled connection detector prematurely.

Related issue(s):

Fixes #

Notes for reviewer:

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

…d when a request is sent to the block node

Signed-off-by: Tim Farber-Newman <tim.farber-newman@swirldslabs.com>
…BUG; introduce cool down for behind block nodes

Signed-off-by: Tim Farber-Newman <tim.farber-newman@swirldslabs.com>
@timfn-hg timfn-hg added this to the v0.75 milestone Apr 30, 2026
@timfn-hg timfn-hg self-assigned this Apr 30, 2026
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Apr 30, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@lfdt-bot
Copy link
Copy Markdown

lfdt-bot commented Apr 30, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 30, 2026

Not up to standards ⛔

🔴 Issues 19 minor

Alerts:
⚠ 19 issues (≤ 0 issues of at least minor severity)

Results:
19 new issues

Category Results
CodeStyle 19 minor

View in Codacy

🟢 Metrics 9 complexity

Metric Results
Complexity 9

View in Codacy

🟢 Coverage 79.59% diff coverage · -0.01% coverage variation

Metric Results
Coverage variation -0.01% coverage variation (-1.00%)
Diff coverage 79.59% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (198285b) 100192 78875 78.72%
Head commit (ba43a83) 100256 (+64) 78914 (+39) 78.71% (-0.01%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#25203) 147 117 79.59%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 75.51020% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...s/impl/streaming/BlockNodeStreamingConnection.java 74.78% 24 Missing and 6 partials ⚠️
...cks/impl/streaming/BlockNodeConnectionManager.java 62.50% 3 Missing ⚠️
...ks/impl/streaming/AbstractBlockNodeConnection.java 75.00% 1 Missing ⚠️
...cks/impl/streaming/BlockNodeServiceConnection.java 0.00% 1 Missing ⚠️
...om/hedera/node/app/metrics/BlockStreamMetrics.java 50.00% 1 Missing ⚠️

Impacted file tree graph

@@             Coverage Diff              @@
##               main   #25203      +/-   ##
============================================
- Coverage     74.90%   74.88%   -0.02%     
  Complexity    11524    11524              
============================================
  Files          2586     2586              
  Lines        100289   100353      +64     
  Branches      11087    11098      +11     
============================================
+ Hits          75119    75149      +30     
- Misses        21364    21389      +25     
- Partials       3806     3815       +9     
Files with missing lines Coverage Δ Complexity Δ
.../node/app/blocks/impl/streaming/BlockBufferIO.java 87.70% <100.00%> (ø) 0.00 <0.00> (ø)
.../app/blocks/impl/streaming/BlockBufferService.java 73.74% <100.00%> (-0.89%) 0.00 <0.00> (ø)
...dera/node/app/blocks/impl/streaming/BlockNode.java 96.09% <100.00%> (+0.20%) 0.00 <0.00> (ø)
.../blocks/impl/streaming/BlockNodeConfigService.java 95.68% <100.00%> (ø) 0.00 <0.00> (ø)
...ra/node/config/data/BlockNodeConnectionConfig.java 100.00% <ø> (ø) 0.00 <0.00> (ø)
...ks/impl/streaming/AbstractBlockNodeConnection.java 91.17% <75.00%> (-0.83%) 0.00 <0.00> (ø)
...cks/impl/streaming/BlockNodeServiceConnection.java 90.47% <0.00%> (ø) 0.00 <0.00> (ø)
...om/hedera/node/app/metrics/BlockStreamMetrics.java 72.59% <50.00%> (ø) 0.00 <0.00> (ø)
...cks/impl/streaming/BlockNodeConnectionManager.java 93.22% <62.50%> (-0.40%) 0.00 <0.00> (ø)
...s/impl/streaming/BlockNodeStreamingConnection.java 83.51% <74.78%> (-3.01%) 0.00 <0.00> (ø)

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread hedera-node/docs/design/app/blocks/BlockNodeStreamingMetrics.md
derektriley
derektriley previously approved these changes May 1, 2026
timfn-hg added 2 commits May 1, 2026 11:55
… down to be percent based instead of a fixed number of blocks

Signed-off-by: Tim Farber-Newman <tim.farber-newman@swirldslabs.com>
@timfn-hg timfn-hg marked this pull request as ready for review May 1, 2026 16:56
@timfn-hg timfn-hg requested a review from a team as a code owner May 1, 2026 16:56
@swirlds-automation swirlds-automation dismissed derektriley’s stale review May 1, 2026 16:56

4 file(s) changed in commit 69330e6

Signed-off-by: Tim Farber-Newman <tim.farber-newman@swirldslabs.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Flaky Test(s) Detected

One or more flaky tests were detected in this run. These tests have been reported before.

Test Ticket
com.hedera.services.bdd.suites.freeze.JumpstartFileSuite#jumpstartsCorrectLiveWrappedRecordBlockHashes #24927
com.hedera.services.bdd.suites.validation.LogValidationTest#logsContainNoUnexpectedProblems #25154

Signed-off-by: Tim Farber-Newman <tim.farber-newman@swirldslabs.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Flaky Test(s) Detected

One or more flaky tests were detected in this run. These tests have been reported before.

Test Ticket
com.hedera.services.bdd.suites.freeze.JumpstartFileSuite#jumpstartsCorrectLiveWrappedRecordBlockHashes #24927
com.hedera.services.bdd.suites.validation.LogValidationTest#logsContainNoUnexpectedProblems #25154

@timfn-hg
Copy link
Copy Markdown
Contributor Author

timfn-hg commented May 1, 2026

@timfn-hg timfn-hg merged commit a9641f8 into main May 1, 2026
82 of 84 checks passed
@timfn-hg timfn-hg deleted the timfn/more-bn-comm-fixes branch May 1, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants