This is the official documentation for the KV Block Manager, a high-performance system for managing key-value (KV) cache blocks in Large Language Model (LLM) inference.
The KV Block Manager provides:
- Multi-tier Storage: Support for GPU memory, CPU memory, local NVMe, and remote storage
- Block Reuse: Intelligent caching and reuse of KV blocks to reduce memory footprint
- Distributed Support: Built-in support for distributed inference across multiple workers
- Python Integration: Native Python bindings with DLPack support
- vLLM Compatibility: Direct integration with vLLM for production deployments
use dynamo_llm::block_manager::{
KvBlockManager, KvBlockManagerConfig, KvManagerModelConfig, KvManagerRuntimeConfig
};
// Create configuration
let config = KvBlockManagerConfig::builder()
.runtime(KvManagerRuntimeConfig::builder()
.worker_id(0)
.build())
.model(KvManagerModelConfig::builder()
.num_layers(32)
.page_size(16)
.inner_dim(4096)
.build())
.build()?;
// Create block manager
let block_manager = KvBlockManager::new(config).await?;import dynamo_llm
# Create block manager
block_manager = dynamo_llm.BlockManager(
num_layers=32,
page_size=16,
inner_dim=4096
)
# Allocate blocks
blocks = block_manager.allocate_blocks(4)- Overview - High-level architecture overview
- Block Manager - Main orchestrator component
- Configuration - Configuration system
- Storage System - Storage backends and management
- Block Pool - Block lifecycle management
- Block Data - Block data structures and views
- Layout Management - Data layout strategies
- Offloading - Block movement between tiers
- Distributed Management - Distributed operations
- Events and Metrics - Monitoring and observability
- Python API Overview - Python interface overview
- Block Interface - Block and Layer classes
- Layer Interface - Layer data access
- DLPack Integration - Tensor interoperability
- vLLM Integration - Production deployment
- Block Lists - Block collection management
- Performance Optimization - Performance tuning
- Memory Management - Memory optimization
- Error Handling - Error handling patterns
- Best Practices - Development best practices
- Rust API Reference - Complete Rust API documentation
- Python API Reference - Complete Python API documentation
- Configuration Reference - Configuration options
- Basic Usage - Basic usage examples
- Advanced Usage - Advanced patterns
- vLLM Integration - Production examples
This documentation is built using mdBook.
# Install mdBook
cargo install mdbook# Build the documentation
mdbook build
# Serve the documentation locally
mdbook serveThe built documentation will be available in the book/ directory.
To contribute to the documentation:
- Edit the markdown files in the
src/directory - Test your changes locally with
mdbook serve - Submit a pull request with your changes
This documentation is licensed under the Apache License, Version 2.0.
For questions and support: