Skip to content

Conversation

AshinGau
Copy link
Contributor

@AshinGau AshinGau commented Aug 21, 2025

Problem Description

In sub-second block time scenarios, we identified a critical performance degradation issue related to block persistence that creates a cascading effect on system performance. The problem manifests as a vicious cycle that significantly impacts the overall node performance.

Root Cause Analysis

The issue stems from the interaction between canonical block accumulation and the persistence mechanism in get_canonical_blocks_to_persist(). Here's the problematic flow:

  1. Canonical Block Accumulation: In sub-second block time environments, canonical blocks accumulate rapidly in memory
  2. Large Batch Persistence: The get_canonical_blocks_to_persist() function returns all available canonical blocks for persistence without size limits
  3. Expensive Commit Operation: UnifiedStorageWriter::commit() becomes increasingly expensive with larger batch sizes
  4. Potential Commit Blocking: Large batch sizes can cause UnifiedStorageWriter::commit() to occasionally hang, further preventing memory cleanup
  5. Delayed Memory Cleanup: Canonical blocks can only be cleaned from memory after the entire batch is persisted via CanonicalInMemoryState::remove_persisted_blocks()
  6. Performance Degradation: The combination of large batches, potential commit blocking, and delayed cleanup creates a snowball effect

The Vicious Cycle

Sub-second block time
    ↓
Rapid canonical block accumulation
    ↓
Large batch sizes in get_canonical_blocks_to_persist()
    ↓
Expensive UnifiedStorageWriter::commit() operations
    ↓
Potential commit hanging/blocking
    ↓
Slower block persistence completion
    ↓
Delayed CanonicalInMemoryState::remove_persisted_blocks() execution
    ↓
Memory not released promptly
    ↓
System performance degradation
    ↓
Even slower persistence operations
    ↓
More canonical blocks accumulate
    ↓
Larger batch sizes (cycle continues)

Solution

Implementation

We introduced a batch size limit to prevent the snowball effect:

const MAX_BLOCKS_TO_PERSIST: u64 = 8;

let target_number = canonical_head_number
    .saturating_sub(self.config.memory_block_buffer_target())
    .min(last_persisted_number + MAX_BLOCKS_TO_PERSIST);

Experimental Results

Experimental environment: One million accounts, two blocks generated per second, with 6000 transactions per block

Before optimization: Accumulated canonical blocks lead to a gradual increase in memory
image
After optimization: Canonical blocks are released in a timely manner, and the memory usage stabilizes after reaching its peak
image

Copy link
Collaborator

@mediocregopher mediocregopher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, will wait for a second approval

@github-project-automation github-project-automation bot moved this from Backlog to In Progress in Reth Tracker Aug 21, 2025
@mediocregopher mediocregopher added the C-perf A change motivated by improving speed, memory usage or disk footprint label Aug 21, 2025
/// The max number of blocks to persist in batch.
/// Memory is released after blocks are persisted in a batch, so limiting the batch size
/// prevents memory accumulation and ensures timely cleanup in high-throughput scenarios.
const MAX_BLOCKS_TO_PERSIST: u64 = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why we are using 8 here? maybe a comment about the config would be good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-perf A change motivated by improving speed, memory usage or disk footprint
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants