Skip to content

Conversation

@shesek
Copy link
Collaborator

@shesek shesek commented Dec 1, 2025

This PR implements reorg handling that actively removes DB entries originating from stale blocks as the reorg occurs instead of discarding them at read time, plus several optimizations made possible by it. Some notes:

  1. The original motivation for this PR - optimizing TxHistory lookups - already made a DB reindex/migration necessary (to cleanup stale entries), so I ended up bundling in a couple further related optimizations that required DB schema changes, for the TxEdge and TxConf indexes.

  2. This implementation makes the distinction that entries created by stale blocks in the history DB should get undone during reorgs, while txstore DB entries are kept.

    This is partially because txstore entries from stale blocks could be useful for archival purposes (they're not all fully available through the APIs, but could be made to be) and are cheap to keep around (not many and not in the way of history index lookups), but more importantly also because non-unique keys cannot be 'undone' by deletion when handling reorgs (and keeping them in the txstore seemed easier than making an exception for them).

    Because of this (and because a migration was already needed..), the address prefix search index (A) was moved to the txstore DB, while the confirmations index (C) was moved to the history DB.

  3. Care was taken to ensure consistency of the public APIs - it is never possible for blocks to be visible (e.g. in GET /blocks/tip or GET /block/:hash) without the corresponding history entries from those blocks being indexed and visible too (e.g. in GET /address/:address/txs or GET /tx/:txid/:vout/outspend), or vice-versa.

    The consistency is guaranteed by ensuring that in-progress DB writes always refer to heights that don't yet exists (or were removed) from the HeaderList, which keeps partial in-progress writes publicly 'invisible' until processing is completed. This also means that the visible chain tip will temporarily drop down to the common ancestor while the reorg is being processed, which was not the case previously.

    (Preserving both a monotonic chain tip and full consistency under the new design is possible but would require holding an exclusive lock on the HeaderList for the entire reorg processing duration, which seems undesirable.)

  4. The DB schema version was bumped from 1 to 2, with a migration script available as a db-migrate-v1-to-v2 binary. Example use:

    cargo run --bin db-migrate-v1-to-v2 -- -vvv --network mainnet --db-dir db

    Migration of Elements DBs is supported too, using --features liquid.

It makes more sense there, since it doesn't depend on any of the data
added to the txstore in the first `add` stage.

And it needs to be there, for the followup commit that assumes all
entries in the history db can be safely deleted when undoing blocks.
@shesek shesek force-pushed the 202511-undo-reorgs branch 2 times, most recently from 992d996 to ae23bc5 Compare December 1, 2025 05:39
@shesek shesek requested review from RCasatta December 1, 2025 06:00
@RCasatta
Copy link
Collaborator

RCasatta commented Dec 1, 2025

I still need to review the code... After reading the description I wonder if you considered to move to a single db with multiple column families since we are migrating

pub fn lookup_tx_spends(&self, tx: Transaction) -> Vec<Option<SpendingInput>> {
pub fn lookup_tx_spends(&self, tx: &Transaction) -> Vec<Option<SpendingInput>> {
let txid = tx.compute_txid();
let outpoints = tx
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider creating optional outpoints for every output at the start (where None = unspendable) to avoid duplicate allocation when re-creating outpoints for spendable outputs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried refactoring to avoid reconstructing the OutPoints, but it ended up rather complicated (requires changing ChainQuery::lookup_spends() to take borrowed OutPoints and a bunch of data mingling in lookup_tx_spends).

Since OutPoints are just a fixed-sized [u8; 32] + u32 that's cheap to copy/reconstruct (no heap allocation and well below the cache line), I'm not sure if its worth the extra complexity.

shesek added 11 commits December 4, 2025 07:10
Prior to this change, history index entries created by stale blocks
would remain in the history DB and only get discarded at read time.

This change explicitly removes history entries when a reorg occurs,
so we can assume all indexed entries correspond to blocks currently
still part of the best chain.

This enables optimizing some db lookups (in the followup commits),
since readers no longer need to account for stale entries.

(Note schema.md was only corrected to match the existing schema, 'D'
rows were already being kept for both the history and txstore dbs.)
Iterating history db entries now involves a single sequential db scan
(plus reads into the in-memory HeaderList), without the per-tx random
access db reads that were previously needed to verify confirmation status.
Changed from an index of `txid -> Set<blockhash>` to `txid -> blockheight`

- Instead of a list of blocks seen to include the txid (including
  stale blocks), map the txid directly to the single block that
  confirmed it and is still part of the best chain.

- Identify blocks by their height instead of their hash. Previously it
  was necessary to keep the hash to ensure it is still part of the best
  chain, but now we can assume that it is.

- Move the index from the txstore db to the history db, so that its
  entries will get undone during reorgs.
Changed from an index of `funding_txid:vout -> Set<spending_txid:vin>`
to `funding_txid:vout -> spending_txid:vin||spending_height`

- Instead of a list of inputs seen to spend the outpoint, map the
  outpoint directly to the single spending input that is still part of
  the best chain.

- Keep the height of the spending transaction, too. This reduces the
  number of db reads per spend lookup from 2 to 1.
Now possible with the V2 schema, since the exact TxEdge row key can be
derived from the funding_txid:vout alone (previously the key also
included the spending_txid, requiring a prefix scan for each lookup).
Now possible with the V2 schema, since the exact TxConf row key can be
derived from the txid alone (previously the key also included the block,
requiring a prefix scan for each lookup).

This isn't used anywhere yet, but will be used in a followup commit for
the DB migration script (and could potentially be used for a new public
API endpoint).

Exposed as a standalone function so that it can be used directly with a
`DB`, without having to construct the full `ChainQuery` with a `Daemon`.
- Change lookup_txns to use MultiGet

- Use lookup_txns for block transactions and reconstruction too
  (GET /block/:hash/txs and GET /block/:hash/raw)

(This was already possible with the V1 schema, but related to and
builds upon the other V2 changes.)

Plus some related changes:

- Remove expensive sanity check assertion in lookup_txn (involved txid
  computation and wasn't really necessary)

- Add test for raw block reconstruction
Previously each key/value read during iteration was getting duplicated 😮

(This doesn't strictly belong to the PR its included in, but it will
greatly benefit the DB migration script.)
}

/*
use bitcoin::hex::DisplayHex;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug code not removed yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it doesn't really hurt to keep what I used to debug the migration script in there for future reference, not really in the way since that file isn't going to be touched much once we're migrated. I can remove it if it seems unnecessary :)

}
// Write batches without flushing (sync and WAL disabled)
trace!("[1/4] writing batch of {} ops", batch.len());
txstore_db.write_batch(batch, DBFlush::Disable);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to test the in-memory impact of queued batches on some dev deployments to see if we can get the additional memory requirements for a migration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration script itself shouldn't be overly memory-intensive, its set to buffer batches in chunks of 15,000 (BATCH_SIZE) to keep them reasonably sized and avoid excessive memory use.

My understanding is that there isn't much to be gained by making the batches gigantic anyway, since they're written with WAL disabled and the per-batch overhead is very small.

It could however make sense run the migration with more memory for RocksDB's write_buffer_size (configurable via --db-write-buffer-size-mb), so that RocksDB can buffer writes more efficiently internally and won't have to flush to SST as often.

.tx_confirming_block(&history.get_txid())
.map(|blockid| (history, blockid))
// skip over entries that point to non-existing heights (may happen during reorg handling)
let header = headers.header_by_height(history.key.confirmed_height as usize)?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Liquid can have, at most, one reorged block. Not sure if it's worth changing this part.

If we do want to handle it, couldn't the confirmed_height be stale? Another block could have re-orged the tip.

This does save a lookup in tx_confirming_block (self.store.history_db.get(&TxConfRow::key(txid))?;) though since we can get the confirmed_height directly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Liquid can have, at most, one reorged block.

The comment is actually misleading, this can also happen while new blocks are being added, not just with reorged blocks. So even on Liquid there could be multiple in-progress blocks that are being indexed into the DB but not yet part of the HeaderList (e.g. if connectivity is lost or electrs is restarted and we need to catch up).

I updated the comment(s) to be more accurate in 416cb4c.

couldn't the confirmed_height be stale?

Under the new design, if the confirmed_height exists in the HeaderList then it is guaranteed not to be a stale.

This does save a lookup in tx_confirming_block though since we can get the confirmed_height directly

Yes :) Avoiding the tx_confirming_block calls (and its random access DB reads) is the major win enabled by this PR.

Also adds HeaderList::best_height() to help avoid off-by-one errors for
the chain length vs tip height (like I initially made when implementing
this >.<), and to make getting the tip height of an empty HeaderList an
explicit error (previously it over overflow and return usize::MAX).
@shesek
Copy link
Collaborator Author

shesek commented Dec 5, 2025

I wonder if you considered to move to a single db with multiple column families since we are migrating

We definitely should move to using column families, but its a rather invasive change and I'm not sure it should be bundled with this PR.

It would also be a much larger/slower migration, basically reading out the entire DB and writing it back in. I'm not sure if it crosses that threshold, but I guess that at some point for large/complex migrations it might make more sense to just bite the bullet and do a full re-index instead?

Prior to this fix, `Indexer::update()` would panic on the
`assert_eq!(tip, *headers.tip())` assertion when handling reorgs that
shorten the existing chain without adding any blocks to replace them.

This should not normally happen, but might due to manual `invalidateblock`.
For example, this will reproduce the panic:
`bitcoin-cli invalidateblock $(bitcoin-cli getbestblockhash)`
@shesek shesek force-pushed the 202511-undo-reorgs branch from ae23bc5 to 416cb4c Compare December 5, 2025 03:23
@shesek
Copy link
Collaborator Author

shesek commented Dec 5, 2025

I pushed a couple bug fixes, dafcd6d (stale entries in lookup_confirmations) and 4c72b9f (handling of chain-shortening reorgs).

The latter bug already exists in the main new-index, but it should not be normally possible to hit it without manual intervention (e.g. bitcoin-cli invalidateblock $(bitcoin-cli getbestblockhash), which causes an assertion panic before the fix).

Copy link

@Micfinch Micfinch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look im in training so help mr out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants