KV Block Manager Documentation

This is the official documentation for the KV Block Manager, a high-performance system for managing key-value (KV) cache blocks in Large Language Model (LLM) inference.

Overview

The KV Block Manager provides:

Multi-tier Storage: Support for GPU memory, CPU memory, local NVMe, and remote storage
Block Reuse: Intelligent caching and reuse of KV blocks to reduce memory footprint
Distributed Support: Built-in support for distributed inference across multiple workers
Python Integration: Native Python bindings with DLPack support
vLLM Compatibility: Direct integration with vLLM for production deployments

Quick Start

Rust Usage

use dynamo_llm::block_manager::{
    KvBlockManager, KvBlockManagerConfig, KvManagerModelConfig, KvManagerRuntimeConfig
};

// Create configuration
let config = KvBlockManagerConfig::builder()
    .runtime(KvManagerRuntimeConfig::builder()
        .worker_id(0)
        .build())
    .model(KvManagerModelConfig::builder()
        .num_layers(32)
        .page_size(16)
        .inner_dim(4096)
        .build())
    .build()?;

// Create block manager
let block_manager = KvBlockManager::new(config).await?;

Python Usage

import dynamo_llm

# Create block manager
block_manager = dynamo_llm.BlockManager(
    num_layers=32,
    page_size=16,
    inner_dim=4096
)

# Allocate blocks
blocks = block_manager.allocate_blocks(4)

Documentation Structure

Core Architecture

Overview - High-level architecture overview
Block Manager - Main orchestrator component
Configuration - Configuration system
Storage System - Storage backends and management
Block Pool - Block lifecycle management
Block Data - Block data structures and views
Layout Management - Data layout strategies
Offloading - Block movement between tiers
Distributed Management - Distributed operations
Events and Metrics - Monitoring and observability

Python Bindings

Python API Overview - Python interface overview
Block Interface - Block and Layer classes
Layer Interface - Layer data access
DLPack Integration - Tensor interoperability
vLLM Integration - Production deployment
Block Lists - Block collection management

Advanced Topics

Performance Optimization - Performance tuning
Memory Management - Memory optimization
Error Handling - Error handling patterns
Best Practices - Development best practices

API Reference

Rust API Reference - Complete Rust API documentation
Python API Reference - Complete Python API documentation
Configuration Reference - Configuration options

Examples

Basic Usage - Basic usage examples
Advanced Usage - Advanced patterns
vLLM Integration - Production examples

Building the Documentation

This documentation is built using mdBook.

Prerequisites

# Install mdBook
cargo install mdbook

Building

# Build the documentation
mdbook build

# Serve the documentation locally
mdbook serve

The built documentation will be available in the book/ directory.

Contributing

To contribute to the documentation:

Edit the markdown files in the src/ directory
Test your changes locally with mdbook serve
Submit a pull request with your changes

License

This documentation is licensed under the Apache License, Version 2.0.

Support

For questions and support:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
book		book
src		src
README.md		README.md
book.toml		book.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KV Block Manager Documentation

Overview

Quick Start

Rust Usage

Python Usage

Documentation Structure

Core Architecture

Python Bindings

Advanced Topics

API Reference

Examples

Building the Documentation

Prerequisites

Building

Contributing

License

Support

About

Uh oh!

Releases

Packages

Languages

ziqifan617/ziqif-nv.github.io

Folders and files

Latest commit

History

Repository files navigation

KV Block Manager Documentation

Overview

Quick Start

Rust Usage

Python Usage

Documentation Structure

Core Architecture

Python Bindings

Advanced Topics

API Reference

Examples

Building the Documentation

Prerequisites

Building

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages