Skip to content

ziqifan617/ziqif-nv.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KV Block Manager Documentation

This is the official documentation for the KV Block Manager, a high-performance system for managing key-value (KV) cache blocks in Large Language Model (LLM) inference.

Overview

The KV Block Manager provides:

  • Multi-tier Storage: Support for GPU memory, CPU memory, local NVMe, and remote storage
  • Block Reuse: Intelligent caching and reuse of KV blocks to reduce memory footprint
  • Distributed Support: Built-in support for distributed inference across multiple workers
  • Python Integration: Native Python bindings with DLPack support
  • vLLM Compatibility: Direct integration with vLLM for production deployments

Quick Start

Rust Usage

use dynamo_llm::block_manager::{
    KvBlockManager, KvBlockManagerConfig, KvManagerModelConfig, KvManagerRuntimeConfig
};

// Create configuration
let config = KvBlockManagerConfig::builder()
    .runtime(KvManagerRuntimeConfig::builder()
        .worker_id(0)
        .build())
    .model(KvManagerModelConfig::builder()
        .num_layers(32)
        .page_size(16)
        .inner_dim(4096)
        .build())
    .build()?;

// Create block manager
let block_manager = KvBlockManager::new(config).await?;

Python Usage

import dynamo_llm

# Create block manager
block_manager = dynamo_llm.BlockManager(
    num_layers=32,
    page_size=16,
    inner_dim=4096
)

# Allocate blocks
blocks = block_manager.allocate_blocks(4)

Documentation Structure

Core Architecture

Python Bindings

Advanced Topics

API Reference

Examples

Building the Documentation

This documentation is built using mdBook.

Prerequisites

# Install mdBook
cargo install mdbook

Building

# Build the documentation
mdbook build

# Serve the documentation locally
mdbook serve

The built documentation will be available in the book/ directory.

Contributing

To contribute to the documentation:

  1. Edit the markdown files in the src/ directory
  2. Test your changes locally with mdbook serve
  3. Submit a pull request with your changes

License

This documentation is licensed under the Apache License, Version 2.0.

Support

For questions and support:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published