feat: combine generate-state-bloat and init-state into single flow#3345
feat: combine generate-state-bloat and init-state into single flow#3345
Conversation
Adds `tempo generate-state-bloat` CLI command that derives TIP20 storage slots and writes them directly into the database via ETL collectors, bypassing the intermediate binary dump file. Extracts `StorageLoader` from init_state.rs to share ETL collection, genesis merge, DB writes, and trie computation between both commands. Closes RETH-665 Amp-Thread-ID: https://ampcode.com/threads/T-019d2ae1-3209-7268-9621-014b6a132778
ETL's par_sort_unstable_by does not preserve insertion order for equal keys. Add a 1-byte priority suffix (0x00 for genesis, 0x01 for dump) so dump entries deterministically win over genesis for overlapping slots. Also: increase WORKER_CHUNK_SIZE 100→4096 for the single hash worker, remove unused Address from slot_bytes, make log_collection_progress private. Amp-Thread-ID: https://ampcode.com/threads/T-019d2d59-8dcd-70de-a88f-4b09b1684c15
| // Process blocks from binary file | ||
| loop { | ||
| // Read next block header; EOF means no more blocks. | ||
| let mut header_buf = [0u8; 40]; | ||
| match reader.read_exact(&mut header_buf) { | ||
| Ok(()) => {} |
There was a problem hiding this comment.
the chunk is moved over to InitFromBinaryDump
9d5952f to
27a7ed6
Compare
|
cc @decofe ✅ Benchmark complete! View job Bench Comparison: 27a7ed6 vs 27a7ed6Configuration
Results
Per-Run Details
Observability |
dfbf7d3 to
ee34c92
Compare
| Self::GenerateStateBloat(cmd) => { | ||
| let runtime = runner.runtime(); | ||
| runner.run_blocking_until_ctrl_c( | ||
| cmd.execute::<tempo_node::node::TempoNode>(runtime), | ||
| )?; | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
thinking this might be needed in future
ee34c92 to
62c2806
Compare
| # Generate bloat file | ||
| let bloat_file = $"($abs_localnet)/state_bloat.bin" | ||
| if $bloat > 0 { | ||
| print $"Generating state bloat \(($bloat) MiB\)..." | ||
| let token_args = ($TIP20_TOKEN_IDS | each { |id| ["--token" $"($id)"] } | flatten) | ||
| cargo run -p tempo-xtask --profile $profile -- generate-state-bloat --size $bloat --out $bloat_file ...$token_args | ||
| } |
There was a problem hiding this comment.
we dont need this anymore because we now generate a shared .bin file from the baseline worktree's xtask, which both baseline and feature sides would then load via init-from-binary-dump.
62c2806 to
e29b1da
Compare
Move GenerateStateBloat from a separate file into init_state.rs so both state-init commands share the same module. StorageLoader and LoadStats become private to the module. Amp-Thread-ID: https://ampcode.com/threads/T-019d2d83-eba4-740c-8fea-04cbf1cad24f
e29b1da to
aa52f7b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa52f7b3ee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR streamlines the “state bloat” workflow by adding a tempo generate-state-bloat CLI subcommand that derives TIP20 storage entries and writes them directly into the DB via ETL, and by refactoring shared loading logic so both the legacy binary-dump loader and the new direct generator reuse the same pipeline.
Changes:
- Add
tempo generate-state-bloatsubcommand and wire it into the CLI. - Refactor init-state logic into a shared
StorageLoaderthat handles ETL, genesis merge, DB writes, and trie/state-root computation. - Update
tempo.nubench/dev flows to generate bloat directly into the database (no intermediate.bin).
Reviewed changes
Copilot reviewed 1 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
tempo.nu |
Switch bench/dev scripts from binary dump generation + init-from-binary-dump to generate-state-bloat writing directly into DB. |
bin/tempo/src/tempo_cmd.rs |
Expose the new GenerateStateBloat subcommand and execute it via the CLI runner. |
bin/tempo/src/init_state.rs |
Introduce StorageLoader; keep binary-dump loader; implement direct bloat generation with parallel address derivation and ETL-based DB writes. |
bin/tempo/Cargo.toml |
Add dependencies needed for mnemonic/BIP32 derivation and parallelism (alloy-signer, coins-bip32, rayon, etc.). |
Cargo.lock |
Lockfile updates for the new dependencies. |
Comments suppressed due to low confidence (1)
tempo.nu:101
- The early-return condition uses
($datadir)/dbexistence as a proxy for “bloat already loaded”, buttempo initalone will also create that path. If a user previously initialized without bloat and then reruns with--bloat, this will incorrectly skip generation and print a misleading message. Consider either checking for a dedicated marker (e.g., a file in the datadir/meta) or at least change the message to indicate you’re skipping because the DB already exists (not because bloat is present).
# Skip if this node already has a database with bloat loaded
if ($db_path | path exists) {
print $"State bloat already loaded into ($datadir | path basename)"
return
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Skip zero-value entries in the shared loader so direct generation matches init-from-binary-dump. Restore the shared binary-dump path for comparison benches so older refs still initialize and both sides start from identical prebuilt state. Co-authored-by: YK <46377366+yongkangc@users.noreply.github.com> Amp-Thread-ID: https://ampcode.com/threads/T-019d2e4d-4b5c-716b-a13f-c57cc63c2438
Co-authored-by: YK <46377366+yongkangc@users.noreply.github.com> Amp-Thread-ID: https://ampcode.com/threads/T-019d2e63-4b27-7416-ab6c-28cf05adde46
…66+yongkangc@users.noreply.github.com> Amp-Thread-ID: https://ampcode.com/threads/T-019d2e8f-d174-754b-8284-9a81530659cd
|
@decofe bench preset=tip20 duration=300 bloat=1 tps=10000 |
Refresh stale loader comments after the direct generate-state-bloat path replaced the old dump-only flow.\n\nCo-Authored-By: YK <46377366+yongkangc@users.noreply.github.com> Amp-Thread-ID: https://ampcode.com/threads/T-019d2e92-432c-731c-b396-8438d71b1c6e
Co-authored-by: YK <46377366+yongkangc@users.noreply.github.com> Amp-Thread-ID: https://ampcode.com/threads/T-019d2e9b-1515-7459-888c-42f7502cc533
|
cc @yongkangc ✅ Benchmark complete! View job Bench Comparison: 401bb01 vs 80973ceConfiguration
Results
Per-Run Details
Observability |
Closes RETH-665
Adds
tempo generate-state-bloatCLI command that derives TIP20 storageslots and writes them directly into the database via ETL collectors,
bypassing the intermediate binary dump file.
Extracts
StorageLoaderfrominit_state.rsto share ETL collection,genesis merge, DB writes, and trie computation between both commands.
Before
flowchart LR A1[bench.yml] -->|nu tempo.nu bench --bloat N| A2[tempo.nu] A2 -->|cargo run -p tempo-xtask generate-state-bloat --out file.bin| A3[xtask] A3 -->|writes| A4[.bin file] A2 -->|tempo init-from-binary-dump file.bin| A5[tempo CLI] A4 -->|reads| A5 A5 -->|ETL → DB| A6[(Database)]After
flowchart LR B1[bench.yml] -->|nu tempo.nu bench --bloat N| B2[tempo.nu] B2 -->|tempo generate-state-bloat --size N| B3[tempo CLI] B3 -->|derive → ETL → DB| B4[(Database)]