Skip to content

Fix LMDB problems in Rust code#281

Draft
spreston8 wants to merge 2 commits intorust/devfrom
fix-lmdb
Draft

Fix LMDB problems in Rust code#281
spreston8 wants to merge 2 commits intorust/devfrom
fix-lmdb

Conversation

@spreston8
Copy link
Copy Markdown
Collaborator

Fix LMDB Environment Management and Test Infrastructure

Problem

Casper tests failing with infrastructure errors:

  • EnvAlreadyOpened - LMDB environments opened multiple times
  • "Too many open files" (os error 24) - File descriptor exhaustion
  • "No space left on device" (os error 28) - LMDB mmap failures on macOS
  • Test race conditions causing intermittent failures

Root Causes

  1. LMDB env reuse bug: rspace-history and rspace-roots share same LMDB path but code opened separate environments
  2. heed 0.11 limitations: Outdated LMDB bindings had mmap issues on macOS
  3. Genesis caching: Tests bypassed cache, creating excessive LMDB environments
  4. Metrics isolation: Rust shared global metrics recorder (Scala creates per-test instances)

Changes

LMDB Fixes

  • rspace_store_manager.rs: Reuse single LMDB env for databases sharing same path
  • lmdb_store_manager.rs: Simplified env lifecycle management
  • lmdb_dir_store_manager.rs: Store and reuse managers by env name (matching Scala)
  • Upgraded heed 0.11 → 0.22, added read_txn_without_tls() for MDB_NOTLS flag

Test Fixes

  • Added per-module OnceCell genesis caching aligned with Scala's per-class pattern
  • API tests: Use separate temp directories for DAG and RuntimeManager
  • approve_block_protocol_test.rs: Added #[serial] to prevent metrics race condition
  • multi_parent_casper_merge_spec.rs: Fixed async runtime nesting panic

Copy link
Copy Markdown
Collaborator

@AndriiS-DevBrother AndriiS-DevBrother left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
just 1 comment with potential issue

Comment on lines +52 to +74
if self.env.is_none() {
self.env = Some(self.create_env()?);
}
let env = self.env.as_ref().unwrap();

// Check if database already exists
{
let dbs = self.dbs.lock().await;
if let Some(db) = dbs.get(&name) {
return Ok(Arc::new(LmdbKeyValueStore::new(env.clone(), db.clone())));
}
}

// Create the database (heed v0.22 requires a write transaction)
let mut wtxn = env.write_txn()?;
let db = env.create_database(&mut wtxn, Some(&name))?;
wtxn.commit()?;

// Store database reference
{
let mut dbs = self.dbs.lock().await;
dbs.insert(name, db.clone());
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't Rust so deep, but AI comment it as potential issue:

LMDB env race and handle mix: concurrent store calls can each execute the is_none branch, create separate Envs, and then overwrite self.env while prior DB handles remain tied to the first env. Subsequent operations may mix DB handles with a different env, which LMDB forbids and can manifest as corruption or panics. Guard env creation with a dedicated lock/OnceCell, or create env eagerly during construction.

Suggested fix: make env creation single-flight (e.g., OnceCell<Env<WithoutTls>> with get_or_try_init or an async mutex around creation) and optionally initialize eagerly in new. Ensure DB creation also uses that single env (consider a short mutex around create_database).

Comment thread rspace++/libs/rspace_rhotypes/src/lib.rs
@spreston8 spreston8 marked this pull request as draft March 31, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants