Skip to content

[execute] Spawned block-processing tasks swallow panics, deadlocking the entire pipeline #295

@keanji-x

Description

@keanji-x

Summary

A panic inside Core::process() (spawned via fire-and-forget tokio::spawn at lib.rs:269-273) is silently consumed because the JoinHandle is dropped without being awaited. The panicking task never calls notify() on any of the four barriers (execute_block_barrier, merklize_barrier, seal_barrier, make_canonical_barrier), causing all subsequent blocks to hang forever.

Related sub-issues

  1. Bare .unwrap() on barrier waits (L476, L478, L496, L522): If a prior block panicked, these unwraps cascade into further panics, amplifying the deadlock across all in-flight blocks.
  2. seal_barrier not closed on shutdown (L250-254): When the ordered-block channel closes, run() closes three barriers but omits seal_barrier, leaving any task waiting on it permanently hung.

Reproduction

  1. Trigger any assert!/assert_eq! failure inside process() (e.g., epoch mismatch at L401, execute_height invariant at L461).
  2. Observe that no subsequent blocks are processed — they all hang on execute_block_barrier.wait_timeout.

Impact

  • Severity: Critical
  • Complete pipeline halt with no recovery path other than node restart.
  • Multiple production assert!/assert_eq! calls exist in non-#[cfg(debug_assertions)] paths (L401, L459, L461, L700, L778, L945), making this triggerable.

Suggested investigation areas

  • Await or JoinSet-manage the spawned tasks to propagate panics.
  • Convert production-path assert! to graceful error handling.
  • Add seal_barrier.close() to the shutdown path.

Files

  • crates/pipe-exec-layer-ext-v2/execute/src/lib.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions