Skip to content

Fixes for Embers#466

Open
dylon wants to merge 18 commits intorust/devfrom
dylon/embers-demo-fixes
Open

Fixes for Embers#466
dylon wants to merge 18 commits intorust/devfrom
dylon/embers-demo-fixes

Conversation

@dylon
Copy link
Copy Markdown
Collaborator

@dylon dylon commented Mar 31, 2026

This PR resolves COST_MISMATCH errors observed on observer nodes during block replay by implementing per-bind peek semantics, along with several supporting fixes for state hash consistency, exploratory deploy reliability, and replay correctness. It also transforms the RSpace from a globally-serialized tuple space into a concurrent architecture with per-channel parallelism (see docs/rspace/concurrent-rspace-architecture.md).

Changes

Core: Concurrent Par evaluation (rholang/src/rust/interpreter/reduce.rs)

  • Concurrent evaluation with two-phase dispatch: Par branches evaluate concurrently via FuturesUnordered. When a COMM fires during Phase 1 (matching), body dispatch is deferred to a shared queue. After all matching completes, Phase 2 dispatches bodies sequentially — preventing the −644 COST_MISMATCH caused by body evaluation interleaving between RSpace (play) and ReplayRSpace (replay). Nested eval calls create their own Phase 1/Phase 2 cycle.
  • Content-hash candidate ordering: Replaced thread_rng() shuffle with deterministic sorting by source.hash (Blake2b256). Cryptographic hashes provide uniform distribution (fair, no starvation) while being deterministic (consensus-safe). Required because random shuffle caused different COMMs on different runs under concurrent evaluation.
  • locally_free clearing: Channels are normalized (locally_free cleared) before produce and consume operations. This fixes hash mismatches between the hot store (which uses Par::Eq/Hash, ignoring locally_free) and the history store (which uses bincode::serialize, including locally_free), ensuring channel identity is consistent across both stores.
  • locally_free serde normalization: The protobuf-generated locally_free field is additionally normalized at the serialization layer via a custom serialize_with handler that always serializes as empty bytes, ensuring consistent Blake2b256 hashes regardless of stale bit-vector content.
  • Stack size increase (.cargo/config.toml): RUST_MIN_STACK increased from 8MB to 32MB, and tokio worker thread stacks set to 32MB (node/src/main.rs), to accommodate deeper COMM cascades (50+ levels).

Core: Concurrent RSpace (rspace++, rholang)

  • Atomic cost manager: Replace Arc<Mutex> with AtomicI64 + CAS loop, fixing a TOCTOU bug where two concurrent branches could both deduct past zero. Normalize consume-triggered COMM cost accounting to produce-triggered semantics, making gas costs order-independent.
  • Hot store Mutex removal: Remove Mutex and Mutex wrappers from InMemHotStore. DashMap's per-shard RwLocks now serve their intended purpose.
  • ISpace &self: Change all 12 &mut self methods in the ISpace trait to &self with interior mutability (Arc<Mutex<...>>, Arc<RwLock<...>>), enabling shared concurrent access without an outer Mutex.
  • Per-channel-group locks: Add DashMap<u64, Arc<Mutex<()>>> for per-channel-group locking (Rust equivalent of Scala's TwoStepLock + MultiLock). Join patterns serialize through their group lock; single-channel operations use DashMap shard locks only. Deadlock prevention via sorted channel hash ordering.
  • Remove interpreter lock: Remove Arc<tokio::sync::Mutex> from RhoISpace. Delete 19 .try_lock().unwrap() call sites. The ISpace is now accessed without any global lock.
  • Sequential internal dispatch: continue_produce_process and continue_consume_process for persistent COMMs changed from join_all to sequential dispatch-then-re-produce/re-consume, ensuring deterministic ordering within a single COMM's lifecycle.

Core: Per-bind peek semantics (p_input_normalizer.rs, reduce.rs, rspace++)

  • Per-bind peek tracking: Peek flag changed from a single boolean on Receive to per-bind tracking via BTreeSet. Each bind in a multi-bind receive can independently use peek (<<-), with data preserved only for peeked channels.
  • ReceiveBind.peek field: Added peek: bool field to ReceiveBind protobuf message for per-bind peek propagation.
  • RSpace peek semantics: Updated store_persistent_data and remove_matched_datum_and_join in both rspace.rs and replay_rspace.rs to respect the per-bind peeks set — peeked channels preserve their data, non-peeked channels have data consumed normally.

Core: Genesis pre-state hash handling (block_approver_protocol.rs, initializing.rs, interpreter_util.rs)

  • Dynamic genesis hash: Replaced hardcoded empty_state_hash_fixed() constant with the genesis block's own pre_state_hash. The hardcoded constant breaks when evaluation order or trie hashing changes. The pre-state hash is now dynamically computed during compute_genesis() and stored on the RuntimeManager.
  • genesis_pre_state_hash parameter: Added to compute_parents_post_state() so genesis validation uses the block's own hash rather than requiring the runtime manager to have computed genesis.
  • InvalidPreStateHash replay failure: Blocks referencing non-existent state roots are now treated as invalid blocks (Right(None)) rather than internal errors (Left). UnknownRootError is detected immediately without retrying (it's deterministic).

Core: spawn_runtime_at() (runtime_manager.rs)

  • New method: spawn_runtime_at(hash) creates an RSpace child directly at a target state hash, avoiding the bug where spawn() + reset() leaves the history reader stale (history repository is updated but the hot store's reader still points to the old root).
  • Registry init flag: spawn_runtime and spawn_replay_runtime now pass init_registry=false since the registry is already in the state from genesis — re-initializing per block would corrupt state.

Core: Exploratory deploy fix (runtime.rs)

  • Fresh key pair per call: Exploratory deploy now generates a fresh secp256k1 key pair and wall-clock timestamp for each invocation, ensuring the deploy's RNG seed is unique and the externally-computed return channel always matches the ret channel created inside eval_new.

API: Trie stats endpoint (web_api.rs, shared_handlers.rs)

  • GET /trie-stats: New LFS diagnostic endpoint that returns block number, state hash, and read-only status for comparing validator and observer state completeness.

API: Block propose retry (block_api.rs)

  • Retry on "already in progress": When a propose is already in progress, the system now retries with configurable delay instead of silently returning.

Infrastructure: macOS LMDB semaphore cleanup (resources.rs)

  • atexit handler: On macOS, LMDB uses POSIX named semaphores that are never cleaned up when lazy_static values aren't dropped. Added startup cleanup of orphaned semaphores and an atexit handler to prevent ENOSPC after repeated test runs.

Diagnostics

  • Extensive structured tracing added throughout the evaluation pipeline: RNG split behavior, COMM firing details, channel hashing, hot store pending state, registry probes, cost trace sequencing, checkpoint roots, and exploratory deploy flow.

Test coverage

  • Concurrent COMM: Validates COMM fires correctly when send and receive coexist in a single Par under concurrent evaluation.
  • Genesis pre-state hash replay: Verifies replay_compute_state succeeds using the block's own pre_state_hash. Uses with_isolated_runtime_manager for LMDB state isolation.
  • spawn_runtime_at consistency: Confirms data written by a deploy is readable at the new state but not at the old state.
  • Mixed peek multi-bind: Validates per-bind peek preserves peeked channel data while consuming non-peeked channel data.
  • locally_free clearing: Verifies COMM fires despite stale locally_free bits on substituted channels.
  • Exploratory deploy integration test: End-to-end test deploying a persistent contract, finalizing, then invoking via exploratory deploy.
  • Updated expected hashes and costs: Runtime spec hash updated; empty_state_hash test changed to verify non-trivial hash rather than hardcoded comparison; cost accounting values updated for receives-first evaluation and per-bind peek semantics.
  • RSpace peek property tests: New proptest-based tests validating peek data preservation, non-peek data removal, and mixed peek/non-peek selective removal across produce-then-consume and consume-then-produce orderings.
  • Stress tested: 5/5 passes on full util::rholang suite (39 tests per run, zero failures) with concurrent evaluation.

dylon added 18 commits March 30, 2026 18:29
- Revert ReplayRSpace rig() to Scala dual-indexing: index each COMM
  under all its IOEvents (consume + all produces), not just the
  triggering one, so COMMs are findable from either side during replay
- Change reducer eval_par term ordering to receives-first: consumes
  store continuations and register joins before produces try to match,
  preventing COMM_MATCH_FAIL cascades from missing continuations
- Fix Produce equality in ReplayRSpace matches() and
  was_repeated_enough_times() to compare by hash only, matching Scala's
  Produce.equals() override — Rust-only fields (is_deterministic,
  output_value, failed) are not part of the hash and caused false
  negatives in comm.produces.contains() checks
- Fix hot_store to_map() to include channels with only continuations
  (no data), which were previously excluded from the iteration
- Remove broken workarounds that masked the root cause: deferred
  produces, fallback produce matching, put_join/remove_join simulation
  in locked_consume COMM path
- Add per-bind peek support in eval_receive (BTreeSet<i32> peeks
  instead of single bool), matching Scala's per-channel peek semantics
- Add comprehensive replay diagnostics gated behind tracing targets:
  COMM trigger-side mismatch detection, COST_TRACE_OP with channel
  hashes and eval nesting depth, COMM_MATCH_FAIL detailed state dumps,
  hot store per-channel mutation trace, checkpoint state hashing,
  validator produce/consume instrumentation
- Update cost_accounting_spec expected values for persistent produce
  contracts affected by receives-first evaluation order
- Update reduce_spec tests to use structural assertions instead of
  exact random_state byte comparison (bytes depend on eval ordering)
- Add exploratory deploy API tests
The receives-first evaluation ordering in d9124e0 changed state hashes
and introduced a regression where UnknownRootError during block replay
was treated as an internal error instead of an invalid block.

Bug fixes:
- Treat UnknownRootError as invalid block (Right(None)) instead of
  internal error (Left) by adding InvalidPreStateHash replay failure
  variant — blocks referencing non-existent state roots are invalid,
  not errored, and retrying is pointless
- Update hardcoded expected hash in runtime_spec to match receives-first
  evaluation ordering

Test coverage for d9124e0 gaps:
- Receives-first COMM ordering: validates COMM fires correctly when
  send and receive coexist in a single Par
- Genesis pre-state hash replay: verifies replay_compute_state succeeds
  using the block's own pre_state_hash (not a hardcoded constant)
- spawn_runtime_at consistency: confirms data written by a deploy is
  readable via spawn_runtime_at at the new state but not at the old
- Mixed peek multi-bind: validates per-bind peek preserves peeked
  channel data while consuming non-peeked channel data
- locally_free clearing: verifies COMM fires despite stale locally_free
  bits on substituted channels inside new blocks

Infrastructure:
- Add macOS LMDB semaphore cleanup to prevent ENOSPC after repeated
  test runs (atexit handler unlinks orphaned /MDB{r,w} semaphores)
- Remove unused Consume and IOEvent imports from replay_rspace_tests.rs
- Remove dead merge_rand / par_merge_rand / split_rand variables left
  over from weakened random_state assertions in reduce_spec.rs
- Gate ALLOCATOR_TRIM_TOTAL_METRIC import behind cfg(linux) to match
  its usage site in block_processor.rs
RUST_MIN_STACK only affects the main thread and std::thread::Builder
threads. Tokio workers use the system default (2MB on Linux), which is
insufficient for receives-first evaluation's deep COMM cascades (50+
levels). This caused the bootstrap node to crash with SIGSEGV (exit
code 139) during CI integration tests.
…Phase 1)

Replace Arc<Mutex<Cost>> with AtomicI64 + CAS loop for lock-free cost
accounting. The old implementation had a TOCTOU bug: it locked, read,
deducted, unlocked, re-locked, and checked — two concurrent branches
could both deduct past zero between the two lock acquisitions.

The CAS loop atomically checks remaining budget and deducts in a single
compare_exchange_weak, eliminating the race window. This is a
prerequisite for concurrent Par evaluation (Phase 6) where multiple
futures charge costs simultaneously.

Also normalizes consume-triggered COMM cost accounting to produce-
triggered semantics in charging_rspace.rs, ensuring gas costs are
deterministic regardless of which side fires a COMM. This makes the
total cost order-independent (commutative) for concurrent evaluation.

Phase 1 of 6: Maximally parallel RSpace via lock removal and interior
mutability. Next: remove hot store outer Mutex (Phase 2).
…hase 2)

The InMemHotStore wrapped its DashMaps in Arc<Mutex<HotStoreState>>,
serializing ALL hot store operations through a single global lock.
This defeated DashMap's per-shard read-write locks which already
provide thread-safe concurrent access.

Changes:
- Remove Mutex<HotStoreState> wrapper from InMemHotStore
- Remove Mutex<HistoryStoreCache> wrapper (also uses DashMaps)
- Access DashMaps directly via self.state.data.get() etc.
- Use DashMap::clear() for clear() instead of field reassignment
- Update tests to access state directly without lock().unwrap()

The HotStore trait already used &self (not &mut self), so no trait
changes were needed. All 294 rspace++ tests pass.

Phase 2 of 6: Maximally parallel RSpace via lock removal and interior
mutability. Next: ISpace trait &self + interior mutability (Phase 3).
All 12 mutable methods in the ISpace trait now take &self instead of
&mut self, enabling shared concurrent access without an outer Mutex.

Interior mutability changes in RSpace and ReplayRSpace:
- event_log: Vec<Event> → Arc<Mutex<Vec<Event>>>
- produce_counter: BTreeMap → Arc<Mutex<BTreeMap>>
- history_repository: Arc<Box<...>> → Arc<Mutex<Arc<Box<...>>>>
- store: Arc<Box<...>> → Arc<Mutex<Arc<Box<...>>>>

Added get_store() and get_history_repository() helper methods for
lock-free read access to the inner Arc clones. Updated all callers
across rspace++, rholang, and casper crates.

All 294 rspace++ tests and 39 casper rholang tests pass.

Phase 3 of 6: Maximally parallel RSpace via lock removal and interior
mutability. Next: per-channel-group locks for joins (Phase 4).
Replace the global Mutex on the RSpace store with fine-grained
per-channel-group locks. Each channel group (determined by the join
pattern) gets its own Mutex, keyed by a sorted hash of the channels.

Changes to RSpace and ReplayRSpace:
- store: Arc<Mutex<Arc<Box<HotStore>>>> → Arc<RwLock<Arc<Box<HotStore>>>>
  (RwLock allows concurrent reads; write only during checkpoint/reset)
- history_repository: same Mutex → RwLock change
- New channel_locks: DashMap<u64, Arc<Mutex<()>>> for per-group locking
- locked_produce: acquires per-group lock for each join group iteration
- locked_consume: acquires per-group lock for the consume's channel set
- Deadlock prevention via sorted channel hash ordering

For channels WITHOUT joins (the vast majority — private unforgeable
names used by single send/receive pairs), the per-group lock has
minimal contention. Only multi-channel join patterns serialize.

All 294 rspace++ tests and 20 casper runtime tests pass.

Phase 4 of 6: Maximally parallel RSpace via lock removal and interior
mutability. Next: remove interpreter-level space lock (Phase 5).
Remove the Arc<tokio::sync::Mutex<dyn ISpace>> wrapper from RhoISpace
and RhoReplayISpace. Since ISpace methods now use &self (Phase 3) and
per-channel-group locks handle concurrency (Phase 4), the global
interpreter Mutex is redundant.

Changes:
- RhoISpace: Arc<Mutex<Box<dyn ISpace>>> → Arc<Box<dyn ISpace>>
- Remove 19 .try_lock().unwrap() call sites across reduce.rs,
  rho_runtime.rs, contract_call.rs, interpreter.rs, and lib.rs
- Access self.space methods directly via &self
- Remove corresponding drop(space_locked) calls

The ISpace is now accessed without ANY global lock. Concurrent
produce/consume operations on independent channels no longer serialize.

All 115 rholang tests and 20 casper runtime tests pass.

Phase 5 of 6: Maximally parallel RSpace via lock removal and interior
mutability. Next: FuturesUnordered in eval_inner (Phase 6).
… (Phase 6)

Three changes that complete the concurrent evaluation architecture:

1. Content-hash candidate ordering: Replace thread_rng() shuffle in
   RSpace/ReplayRSpace with deterministic sorting by source.hash
   (Blake2b256Hash). Cryptographic hashes are uniformly distributed
   (fair, no starvation) while being deterministic (same data → same
   order). Required for consensus — random shuffle caused different
   COMMs on different runs.

2. FuturesUnordered in eval_inner: Replace sequential for-loop with
   concurrent FuturesUnordered. Receives are listed first in the terms
   vector to bias early continuation registration. Per-channel-group
   locks (Phase 4) ensure join atomicity.

3. Two-phase dispatch: Solve body evaluation interleaving that caused
   -644 COST_MISMATCH between RSpace (play) and ReplayRSpace (replay).
   Phase 1: produce/consume run concurrently, COMM body dispatch is
   DEFERRED to a shared queue. Phase 2: after all Phase 1 futures
   complete, deferred bodies dispatch sequentially. Nested eval calls
   create their own Phase 1/Phase 2 cycle.

Also:
- Fix non-deterministic DashMap iteration in replay_rspace produce
  lookup (use .get() instead of .clone().into_iter().find())
- Add with_isolated_runtime_manager for LMDB state isolation

Results (full util::rholang suite, 5 consecutive runs):
- Sequential for-loop:         3/3 pass (no concurrency)
- FuturesUnordered (no defer): 2/5 pass (body interleaving → -644)
- FuturesUnordered + 2-phase:  5/5 pass (concurrent + deterministic)

Phase 6 of 6: Maximally parallel RSpace via lock removal and interior
mutability. All phases complete.
Comprehensive design document covering the 6-phase refactor from
globally-serialized to per-channel-parallel RSpace evaluation.

Includes architectural diagrams, pseudocode, Scala equivalence
mapping, sequence diagrams for the body interleaving problem and
two-phase dispatch solution, and determinism guarantees.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant