-
Notifications
You must be signed in to change notification settings - Fork 770
Block Database #4027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
DracoLi
wants to merge
28
commits into
dl/lru-on-evict
Choose a base branch
from
dl/blockdb
base: dl/lru-on-evict
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Block Database #4027
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
4566e4a
feat: add eviction callback in LRU cache
DracoLi 68527f1
blockdb setup & readme
DracoLi eb3f50c
feat: block db implementation & readme
DracoLi 07c1e5c
refactor: rename store to database
DracoLi 5fd75ea
feat: add tests and update blockdb to have separate methods to read h…
DracoLi 4fe0e19
feat: data splitting & fix linting
DracoLi 99d0ee2
fix: close db before deleting the file
DracoLi 26ccb70
fix: recovery issues with data files splitting & feedback
DracoLi 1497732
use lru for file cache and fix recovery issues
DracoLi 718559b
refactor: use t.TempDir
DracoLi b3b797c
refactor: move database methods to database.go
DracoLi d8237c9
rename blockHeader -> blockEntryHeader and improve recovery logic
DracoLi 9e44cce
make MaxDataFiles configurable
DracoLi 978a2b5
add more logging
DracoLi 9329e0a
move data and index dir to config and rename config
DracoLi e45c2e6
fix lint
DracoLi 7f2fb82
fix struct alignment and add tests
DracoLi d9fa843
fix: separate errors for directories
DracoLi 5a52340
consistent block height tracking
DracoLi 2d6d85f
remove truncate config
DracoLi b0c4938
add additional tests
DracoLi e1f29db
fix lint and improve test error msg
DracoLi 1df6bc1
remove assertion in go routine
DracoLi e94a72e
limit concurrent calls to persistIndexHeader
DracoLi 748dbf4
add warning log if config values differ from index header
DracoLi e17d951
change warn logs to info
DracoLi 3923a26
add error log for index entry failure
DracoLi d63f80d
Merge branch 'dl/lru-on-evict' into dl/blockdb
DracoLi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,196 @@ | ||
# BlockDB | ||
|
||
BlockDB is a specialized database optimized for blockchain blocks. | ||
|
||
## Key Functionalities | ||
|
||
- **O(1) Performance**: Both reads and writes complete in constant time | ||
- **Parallel Operations**: Multiple threads can read and write blocks concurrently without blocking | ||
- **Flexible Write Ordering**: Supports out-of-order block writes for bootstrapping | ||
- **Configurable Durability**: Optional `syncToDisk` mode guarantees immediate recoverability | ||
- **Automatic Recovery**: Detects and recovers unindexed blocks after unclean shutdowns | ||
|
||
## Design | ||
|
||
BlockDB uses a single index file and multiple data files. The index file maps block heights to locations in the data files, while data files store the actual block content. Data storage can be split across multiple data files based on the maximum data file size. | ||
|
||
``` | ||
┌─────────────────┐ ┌─────────────────┐ | ||
│ Index File │ │ Data File 1 │ | ||
│ (.idx) │ │ (.dat) │ | ||
├─────────────────┤ ├─────────────────┤ | ||
│ Header │ │ Block 0 │ | ||
│ - Version │ ┌─────>│ - Header │ | ||
│ - Min Height │ │ │ - Data │ | ||
│ - Max Height │ │ ├─────────────────┤ | ||
│ - Data Size │ │ │ Block 1 │ | ||
│ - ... │ │ ┌──>│ - Header │ | ||
├─────────────────┤ │ │ │ - Data │ | ||
│ Entry[0] │ │ │ ├─────────────────┤ | ||
│ - Offset ───────┼──┘ │ │ ... │ | ||
│ - Size │ │ └─────────────────┘ | ||
│ - Header Size │ │ | ||
├─────────────────┤ │ | ||
│ Entry[1] │ │ | ||
│ - Offset ───────┼─────┘ | ||
│ - Size │ | ||
│ - Header Size │ | ||
├─────────────────┤ | ||
│ ... │ | ||
└─────────────────┘ | ||
``` | ||
|
||
### File Formats | ||
|
||
#### Index File Structure | ||
|
||
The index file consists of a fixed-size header followed by fixed-size entries: | ||
|
||
``` | ||
Index File Header (64 bytes): | ||
┌────────────────────────────────┬─────────┐ | ||
│ Field │ Size │ | ||
├────────────────────────────────┼─────────┤ | ||
│ Version │ 8 bytes │ | ||
│ Max Data File Size │ 8 bytes │ | ||
│ Min Block Height │ 8 bytes │ | ||
│ Max Contiguous Height │ 8 bytes │ | ||
│ Max Block Height │ 8 bytes │ | ||
│ Next Write Offset │ 8 bytes │ | ||
│ Reserved │ 16 bytes│ | ||
└────────────────────────────────┴─────────┘ | ||
|
||
Index Entry (16 bytes): | ||
┌────────────────────────────────┬─────────┐ | ||
│ Field │ Size │ | ||
├────────────────────────────────┼─────────┤ | ||
│ Data File Offset │ 8 bytes │ | ||
│ Block Data Size │ 4 bytes │ | ||
│ Header Size │ 4 bytes │ | ||
└────────────────────────────────┴─────────┘ | ||
``` | ||
|
||
#### Data File Structure | ||
|
||
Each block in the data file is stored with a block entry header followed by the raw block data: | ||
|
||
``` | ||
Block Entry Header (26 bytes): | ||
┌────────────────────────────────┬─────────┐ | ||
│ Field │ Size │ | ||
├────────────────────────────────┼─────────┤ | ||
│ Height │ 8 bytes │ | ||
│ Size │ 4 bytes │ | ||
│ Checksum │ 8 bytes │ | ||
│ Header Size │ 4 bytes │ | ||
│ Version │ 2 bytes │ | ||
└────────────────────────────────┴─────────┘ | ||
``` | ||
|
||
### Block Overwrites | ||
|
||
BlockDB allows overwriting blocks at existing heights. When a block is overwritten, the new block is appended to the data file and the index entry is updated to point to the new location, leaving the old block data as unreferenced "dead" space. However, since blocks are immutable and rarely overwritten (e.g., during reorgs), this trade-off should have minimal impact in practice. | ||
|
||
### Fixed-Size Index Entries | ||
|
||
Each index entry is exactly 16 bytes on disk, containing the offset, size, and header size. This fixed size enables direct calculation of where each block's index entry is located, providing O(1) lookups. For blockchains with high block heights, the index remains efficient, even at height 1 billion, the index file would only be ~16GB. | ||
|
||
### Durability and Fsync Behavior | ||
|
||
BlockDB provides configurable durability through the `syncToDisk` parameter: | ||
|
||
**Data File Behavior:** | ||
|
||
- **When `syncToDisk=true`**: The data file is fsync'd after every block write, guaranteeing durability against both process failures and kernel/machine failures. | ||
- **When `syncToDisk=false`**: Data file writes are buffered, providing durability against process failures but not against kernel or machine failures. | ||
|
||
**Index File Behavior:** | ||
|
||
- **When `syncToDisk=true`**: The index file is fsync'd every `CheckpointInterval` blocks (when the header is written). | ||
- **When `syncToDisk=false`**: The index file relies on OS buffering and is not explicitly fsync'd. | ||
|
||
### Recovery Mechanism | ||
|
||
On startup, BlockDB checks for signs of an unclean shutdown by comparing the data file size on disk with the indexed data size stored in the index file header. If the data files are larger than what the index claims, it indicates that blocks were written but the index wasn't properly updated before shutdown. | ||
|
||
**Recovery Process:** | ||
|
||
1. Starts scanning from where the index left off (`NextWriteOffset`) | ||
2. For each unindexed block found: | ||
- Validates the block entry header and checksum | ||
- Writes the corresponding index entry | ||
3. Calculates the max contiguous height and max block height | ||
4. Updates the index header with the updated max contiguous height, max block height, and next write offset | ||
|
||
## Usage | ||
|
||
### Creating a Database | ||
|
||
```go | ||
import ( | ||
"errors" | ||
"github.com/ava-labs/avalanchego/x/blockdb" | ||
) | ||
|
||
config := blockdb.DefaultConfig(). | ||
WithDir("/path/to/blockdb") | ||
db, err := blockdb.New(config, logging.NoLog{}) | ||
if err != nil { | ||
fmt.Println("Error creating database:", err) | ||
return | ||
} | ||
defer db.Close() | ||
``` | ||
|
||
### Writing and Reading Blocks | ||
|
||
```go | ||
// Write a block with header size | ||
height := uint64(100) | ||
blockData := []byte("header:block data") | ||
headerSize := uint32(7) // First 7 bytes are the header | ||
err := db.WriteBlock(height, blockData, headerSize) | ||
if err != nil { | ||
fmt.Println("Error writing block:", err) | ||
return | ||
} | ||
|
||
// Read a block | ||
blockData, err := db.ReadBlock(height) | ||
if err != nil { | ||
if errors.Is(err, blockdb.ErrBlockNotFound) { | ||
fmt.Println("Block doesn't exist at this height") | ||
return | ||
} | ||
fmt.Println("Error reading block:", err) | ||
return | ||
} | ||
|
||
// Read block components separately | ||
headerData, err := db.ReadHeader(height) | ||
if err != nil { | ||
if errors.Is(err, blockdb.ErrBlockNotFound) { | ||
fmt.Println("Block doesn't exist at this height") | ||
return | ||
} | ||
fmt.Println("Error reading header:", err) | ||
return | ||
} | ||
bodyData, err := db.ReadBody(height) | ||
if err != nil { | ||
if errors.Is(err, blockdb.ErrBlockNotFound) { | ||
fmt.Println("Block doesn't exist at this height") | ||
return | ||
} | ||
fmt.Println("Error reading body:", err) | ||
return | ||
} | ||
``` | ||
|
||
## TODO | ||
|
||
- Implement a block cache for recently accessed blocks | ||
- Use a buffered pool to avoid allocations on reads and writes | ||
- Add metrics | ||
- Add performance benchmarks | ||
- Consider supporting missing data files (currently we error if any data files are missing) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
// Copyright (C) 2019-2024, Ava Labs, Inc. All rights reserved. | ||
// See the file LICENSE for licensing terms. | ||
|
||
package blockdb | ||
|
||
import "errors" | ||
|
||
// DefaultMaxDataFileSize is the default maximum size of the data block file in bytes (500GB). | ||
const DefaultMaxDataFileSize = 500 * 1024 * 1024 * 1024 | ||
|
||
// DefaultMaxDataFiles is the default maximum number of data files descriptors cached. | ||
const DefaultMaxDataFiles = 10 | ||
|
||
// DatabaseConfig contains configuration parameters for BlockDB. | ||
type DatabaseConfig struct { | ||
// IndexDir is the directory where the index file is stored. | ||
IndexDir string | ||
|
||
// DataDir is the directory where the data files are stored. | ||
DataDir string | ||
|
||
// MinimumHeight is the lowest block height tracked by the database. | ||
MinimumHeight uint64 | ||
|
||
// MaxDataFileSize sets the maximum size of the data block file in bytes. | ||
MaxDataFileSize uint64 | ||
|
||
// MaxDataFiles is the maximum number of data files descriptors cached. | ||
MaxDataFiles int | ||
|
||
// CheckpointInterval defines how frequently (in blocks) the index file header is updated (default: 1024). | ||
CheckpointInterval uint64 | ||
|
||
// SyncToDisk determines if fsync is called after each write for durability. | ||
SyncToDisk bool | ||
} | ||
|
||
// DefaultConfig returns the default options for BlockDB. | ||
func DefaultConfig() DatabaseConfig { | ||
return DatabaseConfig{ | ||
IndexDir: "", | ||
DataDir: "", | ||
MinimumHeight: 0, | ||
MaxDataFileSize: DefaultMaxDataFileSize, | ||
MaxDataFiles: DefaultMaxDataFiles, | ||
CheckpointInterval: 1024, | ||
SyncToDisk: true, | ||
} | ||
} | ||
|
||
// WithDir sets both IndexDir and DataDir to the given value. | ||
func (c DatabaseConfig) WithDir(directory string) DatabaseConfig { | ||
c.IndexDir = directory | ||
c.DataDir = directory | ||
return c | ||
} | ||
|
||
// WithIndexDir returns a copy of the config with IndexDir set to the given value. | ||
func (c DatabaseConfig) WithIndexDir(indexDir string) DatabaseConfig { | ||
c.IndexDir = indexDir | ||
return c | ||
} | ||
|
||
// WithDataDir returns a copy of the config with DataDir set to the given value. | ||
func (c DatabaseConfig) WithDataDir(dataDir string) DatabaseConfig { | ||
c.DataDir = dataDir | ||
return c | ||
} | ||
|
||
// WithSyncToDisk returns a copy of the config with SyncToDisk set to the given value. | ||
func (c DatabaseConfig) WithSyncToDisk(syncToDisk bool) DatabaseConfig { | ||
c.SyncToDisk = syncToDisk | ||
return c | ||
} | ||
|
||
// WithMinimumHeight returns a copy of the config with MinimumHeight set to the given value. | ||
func (c DatabaseConfig) WithMinimumHeight(minHeight uint64) DatabaseConfig { | ||
c.MinimumHeight = minHeight | ||
return c | ||
} | ||
|
||
// WithMaxDataFileSize returns a copy of the config with MaxDataFileSize set to the given value. | ||
func (c DatabaseConfig) WithMaxDataFileSize(maxSize uint64) DatabaseConfig { | ||
c.MaxDataFileSize = maxSize | ||
return c | ||
} | ||
|
||
// WithMaxDataFiles returns a copy of the config with MaxDataFiles set to the given value. | ||
func (c DatabaseConfig) WithMaxDataFiles(maxFiles int) DatabaseConfig { | ||
c.MaxDataFiles = maxFiles | ||
return c | ||
} | ||
|
||
// WithCheckpointInterval returns a copy of the config with CheckpointInterval set to the given value. | ||
func (c DatabaseConfig) WithCheckpointInterval(interval uint64) DatabaseConfig { | ||
c.CheckpointInterval = interval | ||
return c | ||
} | ||
|
||
// Validate checks if the store options are valid. | ||
func (c DatabaseConfig) Validate() error { | ||
if c.IndexDir == "" { | ||
return errors.New("IndexDir must be provided") | ||
} | ||
if c.DataDir == "" { | ||
return errors.New("DataDir must be provided") | ||
} | ||
if c.CheckpointInterval == 0 { | ||
return errors.New("CheckpointInterval cannot be 0") | ||
} | ||
if c.MaxDataFiles <= 0 { | ||
return errors.New("MaxDataFiles must be positive") | ||
} | ||
if c.MaxDataFileSize == 0 { | ||
return errors.New("MaxDataFileSize must be positive") | ||
} | ||
return nil | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.