Skip to content

Block Database #4027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: dl/lru-on-evict
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4566e4a
feat: add eviction callback in LRU cache
DracoLi Jul 15, 2025
68527f1
blockdb setup & readme
DracoLi Jun 1, 2025
eb3f50c
feat: block db implementation & readme
DracoLi Jun 9, 2025
07c1e5c
refactor: rename store to database
DracoLi Jun 9, 2025
5fd75ea
feat: add tests and update blockdb to have separate methods to read h…
DracoLi Jun 22, 2025
4fe0e19
feat: data splitting & fix linting
DracoLi Jun 23, 2025
99d0ee2
fix: close db before deleting the file
DracoLi Jun 23, 2025
26ccb70
fix: recovery issues with data files splitting & feedback
DracoLi Jun 26, 2025
1497732
use lru for file cache and fix recovery issues
DracoLi Jun 30, 2025
718559b
refactor: use t.TempDir
DracoLi Jun 30, 2025
b3b797c
refactor: move database methods to database.go
DracoLi Jul 6, 2025
d8237c9
rename blockHeader -> blockEntryHeader and improve recovery logic
DracoLi Jul 6, 2025
9e44cce
make MaxDataFiles configurable
DracoLi Jul 6, 2025
978a2b5
add more logging
DracoLi Jul 6, 2025
9329e0a
move data and index dir to config and rename config
DracoLi Jul 6, 2025
e45c2e6
fix lint
DracoLi Jul 7, 2025
7f2fb82
fix struct alignment and add tests
DracoLi Jul 7, 2025
d9fa843
fix: separate errors for directories
DracoLi Jul 9, 2025
5a52340
consistent block height tracking
DracoLi Jul 9, 2025
2d6d85f
remove truncate config
DracoLi Jul 10, 2025
b0c4938
add additional tests
DracoLi Jul 10, 2025
e1f29db
fix lint and improve test error msg
DracoLi Jul 10, 2025
1df6bc1
remove assertion in go routine
DracoLi Jul 10, 2025
e94a72e
limit concurrent calls to persistIndexHeader
DracoLi Jul 11, 2025
748dbf4
add warning log if config values differ from index header
DracoLi Jul 11, 2025
e17d951
change warn logs to info
DracoLi Jul 11, 2025
3923a26
add error log for index entry failure
DracoLi Jul 14, 2025
d63f80d
Merge branch 'dl/lru-on-evict' into dl/blockdb
DracoLi Jul 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ require (
github.com/ava-labs/ledger-avalanche/go v0.0.0-20241009183145-e6f90a8a1a60
github.com/ava-labs/libevm v1.13.14-0.3.0.rc.1
github.com/btcsuite/btcd/btcutil v1.1.3
github.com/cespare/xxhash/v2 v2.3.0
github.com/cockroachdb/pebble v0.0.0-20230928194634-aa077af62593
github.com/compose-spec/compose-go v1.20.2
github.com/decred/dcrd/dcrec/secp256k1/v4 v4.1.0
Expand Down Expand Up @@ -92,7 +93,6 @@ require (
github.com/bits-and-blooms/bitset v1.10.0 // indirect
github.com/btcsuite/btcd/btcec/v2 v2.3.2 // indirect
github.com/cenkalti/backoff/v4 v4.2.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/cockroachdb/errors v1.9.1 // indirect
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b // indirect
github.com/cockroachdb/redact v1.1.3 // indirect
Expand Down
196 changes: 196 additions & 0 deletions x/blockdb/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# BlockDB

BlockDB is a specialized database optimized for blockchain blocks.

## Key Functionalities

- **O(1) Performance**: Both reads and writes complete in constant time
- **Parallel Operations**: Multiple threads can read and write blocks concurrently without blocking
- **Flexible Write Ordering**: Supports out-of-order block writes for bootstrapping
- **Configurable Durability**: Optional `syncToDisk` mode guarantees immediate recoverability
- **Automatic Recovery**: Detects and recovers unindexed blocks after unclean shutdowns

## Design

BlockDB uses a single index file and multiple data files. The index file maps block heights to locations in the data files, while data files store the actual block content. Data storage can be split across multiple data files based on the maximum data file size.

```
┌─────────────────┐ ┌─────────────────┐
│ Index File │ │ Data File 1 │
│ (.idx) │ │ (.dat) │
├─────────────────┤ ├─────────────────┤
│ Header │ │ Block 0 │
│ - Version │ ┌─────>│ - Header │
│ - Min Height │ │ │ - Data │
│ - Max Height │ │ ├─────────────────┤
│ - Data Size │ │ │ Block 1 │
│ - ... │ │ ┌──>│ - Header │
├─────────────────┤ │ │ │ - Data │
│ Entry[0] │ │ │ ├─────────────────┤
│ - Offset ───────┼──┘ │ │ ... │
│ - Size │ │ └─────────────────┘
│ - Header Size │ │
├─────────────────┤ │
│ Entry[1] │ │
│ - Offset ───────┼─────┘
│ - Size │
│ - Header Size │
├─────────────────┤
│ ... │
└─────────────────┘
```

### File Formats

#### Index File Structure

The index file consists of a fixed-size header followed by fixed-size entries:

```
Index File Header (64 bytes):
┌────────────────────────────────┬─────────┐
│ Field │ Size │
├────────────────────────────────┼─────────┤
│ Version │ 8 bytes │
│ Max Data File Size │ 8 bytes │
│ Min Block Height │ 8 bytes │
│ Max Contiguous Height │ 8 bytes │
│ Max Block Height │ 8 bytes │
│ Next Write Offset │ 8 bytes │
│ Reserved │ 16 bytes│
└────────────────────────────────┴─────────┘

Index Entry (16 bytes):
┌────────────────────────────────┬─────────┐
│ Field │ Size │
├────────────────────────────────┼─────────┤
│ Data File Offset │ 8 bytes │
│ Block Data Size │ 4 bytes │
│ Header Size │ 4 bytes │
└────────────────────────────────┴─────────┘
```

#### Data File Structure

Each block in the data file is stored with a block entry header followed by the raw block data:

```
Block Entry Header (26 bytes):
┌────────────────────────────────┬─────────┐
│ Field │ Size │
├────────────────────────────────┼─────────┤
│ Height │ 8 bytes │
│ Size │ 4 bytes │
│ Checksum │ 8 bytes │
│ Header Size │ 4 bytes │
│ Version │ 2 bytes │
└────────────────────────────────┴─────────┘
```

### Block Overwrites

BlockDB allows overwriting blocks at existing heights. When a block is overwritten, the new block is appended to the data file and the index entry is updated to point to the new location, leaving the old block data as unreferenced "dead" space. However, since blocks are immutable and rarely overwritten (e.g., during reorgs), this trade-off should have minimal impact in practice.

### Fixed-Size Index Entries

Each index entry is exactly 16 bytes on disk, containing the offset, size, and header size. This fixed size enables direct calculation of where each block's index entry is located, providing O(1) lookups. For blockchains with high block heights, the index remains efficient, even at height 1 billion, the index file would only be ~16GB.

### Durability and Fsync Behavior

BlockDB provides configurable durability through the `syncToDisk` parameter:

**Data File Behavior:**

- **When `syncToDisk=true`**: The data file is fsync'd after every block write, guaranteeing durability against both process failures and kernel/machine failures.
- **When `syncToDisk=false`**: Data file writes are buffered, providing durability against process failures but not against kernel or machine failures.

**Index File Behavior:**

- **When `syncToDisk=true`**: The index file is fsync'd every `CheckpointInterval` blocks (when the header is written).
- **When `syncToDisk=false`**: The index file relies on OS buffering and is not explicitly fsync'd.

### Recovery Mechanism

On startup, BlockDB checks for signs of an unclean shutdown by comparing the data file size on disk with the indexed data size stored in the index file header. If the data files are larger than what the index claims, it indicates that blocks were written but the index wasn't properly updated before shutdown.

**Recovery Process:**

1. Starts scanning from where the index left off (`NextWriteOffset`)
2. For each unindexed block found:
- Validates the block entry header and checksum
- Writes the corresponding index entry
3. Calculates the max contiguous height and max block height
4. Updates the index header with the updated max contiguous height, max block height, and next write offset

## Usage

### Creating a Database

```go
import (
"errors"
"github.com/ava-labs/avalanchego/x/blockdb"
)

config := blockdb.DefaultConfig().
WithDir("/path/to/blockdb")
db, err := blockdb.New(config, logging.NoLog{})
if err != nil {
fmt.Println("Error creating database:", err)
return
}
defer db.Close()
```

### Writing and Reading Blocks

```go
// Write a block with header size
height := uint64(100)
blockData := []byte("header:block data")
headerSize := uint32(7) // First 7 bytes are the header
err := db.WriteBlock(height, blockData, headerSize)
if err != nil {
fmt.Println("Error writing block:", err)
return
}

// Read a block
blockData, err := db.ReadBlock(height)
if err != nil {
if errors.Is(err, blockdb.ErrBlockNotFound) {
fmt.Println("Block doesn't exist at this height")
return
}
fmt.Println("Error reading block:", err)
return
}

// Read block components separately
headerData, err := db.ReadHeader(height)
if err != nil {
if errors.Is(err, blockdb.ErrBlockNotFound) {
fmt.Println("Block doesn't exist at this height")
return
}
fmt.Println("Error reading header:", err)
return
}
bodyData, err := db.ReadBody(height)
if err != nil {
if errors.Is(err, blockdb.ErrBlockNotFound) {
fmt.Println("Block doesn't exist at this height")
return
}
fmt.Println("Error reading body:", err)
return
}
```

## TODO

- Implement a block cache for recently accessed blocks
- Use a buffered pool to avoid allocations on reads and writes
- Add metrics
- Add performance benchmarks
- Consider supporting missing data files (currently we error if any data files are missing)
118 changes: 118 additions & 0 deletions x/blockdb/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
// Copyright (C) 2019-2024, Ava Labs, Inc. All rights reserved.
// See the file LICENSE for licensing terms.

package blockdb

import "errors"

// DefaultMaxDataFileSize is the default maximum size of the data block file in bytes (500GB).
const DefaultMaxDataFileSize = 500 * 1024 * 1024 * 1024

// DefaultMaxDataFiles is the default maximum number of data files descriptors cached.
const DefaultMaxDataFiles = 10

// DatabaseConfig contains configuration parameters for BlockDB.
type DatabaseConfig struct {
// IndexDir is the directory where the index file is stored.
IndexDir string

// DataDir is the directory where the data files are stored.
DataDir string

// MinimumHeight is the lowest block height tracked by the database.
MinimumHeight uint64

// MaxDataFileSize sets the maximum size of the data block file in bytes.
MaxDataFileSize uint64

// MaxDataFiles is the maximum number of data files descriptors cached.
MaxDataFiles int

// CheckpointInterval defines how frequently (in blocks) the index file header is updated (default: 1024).
CheckpointInterval uint64

// SyncToDisk determines if fsync is called after each write for durability.
SyncToDisk bool
}

// DefaultConfig returns the default options for BlockDB.
func DefaultConfig() DatabaseConfig {
return DatabaseConfig{
IndexDir: "",
DataDir: "",
MinimumHeight: 0,
MaxDataFileSize: DefaultMaxDataFileSize,
MaxDataFiles: DefaultMaxDataFiles,
CheckpointInterval: 1024,
SyncToDisk: true,
}
}

// WithDir sets both IndexDir and DataDir to the given value.
func (c DatabaseConfig) WithDir(directory string) DatabaseConfig {
c.IndexDir = directory
c.DataDir = directory
return c
}

// WithIndexDir returns a copy of the config with IndexDir set to the given value.
func (c DatabaseConfig) WithIndexDir(indexDir string) DatabaseConfig {
c.IndexDir = indexDir
return c
}

// WithDataDir returns a copy of the config with DataDir set to the given value.
func (c DatabaseConfig) WithDataDir(dataDir string) DatabaseConfig {
c.DataDir = dataDir
return c
}

// WithSyncToDisk returns a copy of the config with SyncToDisk set to the given value.
func (c DatabaseConfig) WithSyncToDisk(syncToDisk bool) DatabaseConfig {
c.SyncToDisk = syncToDisk
return c
}

// WithMinimumHeight returns a copy of the config with MinimumHeight set to the given value.
func (c DatabaseConfig) WithMinimumHeight(minHeight uint64) DatabaseConfig {
c.MinimumHeight = minHeight
return c
}

// WithMaxDataFileSize returns a copy of the config with MaxDataFileSize set to the given value.
func (c DatabaseConfig) WithMaxDataFileSize(maxSize uint64) DatabaseConfig {
c.MaxDataFileSize = maxSize
return c
}

// WithMaxDataFiles returns a copy of the config with MaxDataFiles set to the given value.
func (c DatabaseConfig) WithMaxDataFiles(maxFiles int) DatabaseConfig {
c.MaxDataFiles = maxFiles
return c
}

// WithCheckpointInterval returns a copy of the config with CheckpointInterval set to the given value.
func (c DatabaseConfig) WithCheckpointInterval(interval uint64) DatabaseConfig {
c.CheckpointInterval = interval
return c
}

// Validate checks if the store options are valid.
func (c DatabaseConfig) Validate() error {
if c.IndexDir == "" {
return errors.New("IndexDir must be provided")
}
if c.DataDir == "" {
return errors.New("DataDir must be provided")
}
if c.CheckpointInterval == 0 {
return errors.New("CheckpointInterval cannot be 0")
}
if c.MaxDataFiles <= 0 {
return errors.New("MaxDataFiles must be positive")
}
if c.MaxDataFileSize == 0 {
return errors.New("MaxDataFileSize must be positive")
}
return nil
}
Loading