Markdown Vector Database (mdvdb)

(pronounced /ˌɛm di ˌvi di ˈbi/)

A filesystem-native vector database built around Markdown files. Zero infrastructure — no servers, no containers, everything lives on disk.

Three search modes out of the box: hybrid (semantic + lexical, fused via RRF), semantic (embedding similarity), and lexical (BM25 full-text). Designed for AI agents that need fast, flexible search over local knowledge bases.

Built For

Use Case	What mdvdb gives you	Workflow
Knowledge Bases	Hybrid search across docs, wikis, and runbooks. Section-level results link straight to the relevant heading — not just the file. Frontmatter filters let you scope queries by tag, status, or any custom field.	`mdvdb ingest` → `mdvdb search "deploy" --filter status=published`
Agent Memory	Two-layer memory model: daily logs (append-only) + curated topic files, connected by wikilinks and standard links. Frontmatter fields (`type`, `tags`, `status`, `confidence`, `source`) make memories filterable. Link graph lets agents traverse context chains via `links` / `backlinks`, and `orphans` surfaces disconnected notes. `--boost-links` re-ranks results using the agent's own cross-references. `--decay` applies exponential recency weighting — old logs fade naturally while actively maintained notes stay prominent, no manual archiving needed. Single-file ingest (`--file`) for near-instant indexing after writes. See the full Agent Memory Graph guide.	`mdvdb ingest` → `mdvdb search "auth" --decay --boost-links --filter type=topic` → `mdvdb links topics/auth.md` → `mdvdb orphans`
Documentation Sites	Index your docs repo and expose search via the library API. Auto-clustering surfaces topic groups without manual tagging. File watching keeps the index current as writers push changes.	`mdvdb watch` → `mdvdb clusters --json` → `mdvdb search "getting started"`
Personal Zettelkasten	Search your slip-box by meaning instead of exact keywords. Works with Obsidian vaults, Logseq graphs, or plain folders — anything that's `.md` files on disk. Non-destructive: never touches your notes.	`mdvdb ingest` → `mdvdb search "emergence in complex systems"` → `mdvdb links slip/note.md`
RAG Pipelines	Drop-in retrieval layer for retrieval-augmented generation. JSON output (`--json`) pipes directly into your LLM toolchain. Pluggable embeddings let you match the same model your generator uses. Switch to `--lexical` when you don't need embedding overhead.	`mdvdb ingest` → `mdvdb search "context" --json \| jq`
Research & Literature Notes	Filter by frontmatter fields like `author`, `year`, or `topic` while searching semantically. Clusters reveal thematic groupings across hundreds of papers or reading notes without manual curation.	`mdvdb ingest` → `mdvdb search "attention mechanism" --filter year=2024` → `mdvdb clusters`

Guides: Agent Memory · Agent Memory Graph

Claude Code Skills

Context-expanding skills for Claude Code. Once installed, Claude automatically picks the right skill based on what you ask — search your vault, explore topics, check document health, or write new files optimized for indexing.

git clone https://github.com/geckse/markdown-vdb-skills.git skills

Skill	What it does
search-docs	Semantic, lexical, or hybrid search across the vault
search-and-summarize	Search, read top matches in full, produce a cited synthesis
explore-topic	Deep research combining search with graph expansion and linked context
find-related	Find related content via semantic edges, links, backlinks, and multi-hop traversal
index-vault	Ingest or re-index markdown files into the vector database
vault-overview	Quick situational awareness: index status, clusters, and file tree
vault-health	Diagnostic checks: doctor, orphan detection, and schema analysis
check-document	Validate a file against the vault schema, check structure and link connectivity
enhance-document	Improve a file for better indexing: add frontmatter, restructure headings, add links
write-document	Create a new markdown file optimized for indexing with proper frontmatter and links
graph-visualize	Export and summarize the vault's knowledge graph for visualization

View skills repo → · Benchmark Suite

Features

Three search modes — hybrid (semantic + BM25 via RRF fusion), semantic, and lexical — switch with --mode or --semantic/--lexical flags
Section-level results — returns the specific heading/section that matched, not just the file
Pluggable embeddings — OpenAI, Ollama, or any custom endpoint
Single index file — portable, memory-mapped, sub-100ms queries
Link graph — wikilinks and standard markdown links tracked in the index; links, backlinks, orphans commands; --boost-links re-ranks search results
Time decay — --decay applies exponential recency weighting with configurable half-life
File watching — automatic re-indexing on changes
Metadata filtering — combine any search mode with frontmatter filters
Auto-clustering — K-means topic clusters with TF-IDF keyword labels
Path-scoped search — --path restricts results to a directory subtree
File tree — mdvdb tree shows sync status of every file at a glance
Diagnostics — mdvdb doctor checks config, provider connectivity, and index health
Preview mode — mdvdb ingest --preview shows what would change without touching the index
Non-destructive — never modifies your markdown files

Quick Start

# Initialize config
mdvdb init

# Index your markdown files
mdvdb ingest

# Hybrid search (default — semantic + lexical fused via RRF)
mdvdb search "how to deploy to production"

# Semantic only
mdvdb search "how to deploy to production" --semantic

# Lexical only (no embedding API call needed)
mdvdb search "deploy" --lexical

# Search with filters and path scope
mdvdb search "authentication" --filter status=published --path docs/ --limit 5

# Time-decayed search (favor recent files)
mdvdb search "auth" --decay --decay-half-life 30

# Check index health
mdvdb doctor

# Watch for changes
mdvdb watch

Installation

# Build from source
git clone https://github.com/geckse/markdown-vdb.git
cd markdown-vdb
cargo install --path .

Requires Rust 1.70+.

Configuration

Create a .markdownvdb file in your project root (or run mdvdb init):

# Embedding provider: openai, ollama, custom
MDVDB_EMBEDDING_PROVIDER=openai
MDVDB_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...

# Directories to index (comma-separated)
MDVDB_SOURCE_DIRS=docs,notes,wiki

# Chunking
MDVDB_CHUNK_MAX_TOKENS=512

# Search defaults
MDVDB_SEARCH_MODE=hybrid
MDVDB_TIME_DECAY=false
MDVDB_DECAY_HALF_LIFE_DAYS=30

Shared credentials can go in ~/.mdvdb/config so your API key works across all projects without repeating it in each .markdownvdb file.

Config resolution order: shell env → .markdownvdb/.config → .markdownvdb → .env → ~/.mdvdb/config → defaults.

See PROJECT.md for the full configuration reference.

CLI Commands

Command	Description
`mdvdb search <query>`	Search with hybrid, semantic, or lexical mode
`mdvdb ingest`	Index or re-index markdown files
`mdvdb status`	Show index health and stats
`mdvdb schema`	List available metadata fields and types
`mdvdb clusters`	Browse auto-generated topic clusters
`mdvdb tree`	Show file tree with sync status indicators
`mdvdb get <path>`	Retrieve a specific document's metadata and frontmatter
`mdvdb links <path>`	Show outgoing links from a file
`mdvdb backlinks <path>`	Show incoming links pointing to a file
`mdvdb orphans`	Find files with no inbound or outbound links
`mdvdb doctor`	Run diagnostic checks on config, provider, and index
`mdvdb watch`	Watch for file changes and re-index automatically
`mdvdb config`	Show resolved configuration
`mdvdb init`	Create a default config file

All commands support --json for machine-readable output.

Library Usage

use mdvdb::{MarkdownVdb, SearchQuery, SearchMode};

let vdb = MarkdownVdb::open(".").await?;

// Hybrid search (default)
let results = vdb.search(
    SearchQuery::new("deployment guide")
        .with_limit(10)
        .with_min_score(0.7)
).await?;

// Lexical-only search (no embedding call)
let results = vdb.search(
    SearchQuery::new("deploy")
        .with_mode(SearchMode::Lexical)
).await?;

// Semantic search with time decay and link boosting
let results = vdb.search(
    SearchQuery::new("authentication")
        .with_mode(SearchMode::Semantic)
        .with_decay(true)
        .with_boost_links(true)
        .with_filter("status", "published")
).await?;

for result in results {
    println!("{} (score: {:.2})", result.file.path, result.score);
    println!("  {}", result.chunk.heading_hierarchy.join(" > "));
}

How It Works

Scan — recursively find .md files (respects .gitignore)
Parse — extract frontmatter, headings, and body content
Chunk — split by headings, with token-limit size guard
Embed — generate vectors via OpenAI/Ollama (batched, concurrent, with content-hash skip)
Index — store in a single memory-mapped file (usearch HNSW + rkyv metadata) plus Tantivy BM25 segments
Search — hybrid (embed query → HNSW + BM25 → RRF fusion), semantic (HNSW only), or lexical (BM25 only) → metadata filter → link boost → time decay → ranked results

Architecture

Markdown files → Discovery → Parsing → Chunking → Embedding → Index (HNSW + BM25)
                                                                       ↓
                        Query → [Embed] → HNSW ─┐
                                          BM25 ──┤→ RRF Fusion → Filter → Decay → Results
                                     Link Graph ─┘

Index format: [64B header][rkyv metadata][usearch HNSW] (memory-mapped) + fts/ directory (Tantivy BM25 segments).

Key dependencies: usearch (HNSW vectors), tantivy (BM25 lexical search), rkyv (zero-copy serde), memmap2 (memory mapping), tiktoken-rs (tokenization), pulldown-cmark (markdown parsing), linfa (K-means clustering).

Comparison

How mdvdb compares to other tools in the Markdown knowledge-base space:

Dimension	mdvdb	Obsidian	qmd (Markdown DB)
Core role	CLI-first filesystem-native vector DB over Markdown; retrieval/memory layer for agents and RAG. Includes a decoupled desktop app.	Desktop editor and PKM app for local Markdown; search is for humans, not a standalone DB.	CLI + library that turns Markdown into a searchable BM25/vector DB on SQLite.
Storage model	Markdown on disk as source of truth; single memory-mapped index + BM25 segments, non-destructive.	Plain .md vault; no native vector index, only via plugins.	Collections in SQLite; Markdown content indexed into BM25 + vector tables.
Retrieval granularity	Section-level chunks by headings with token limits; results return heading hierarchy.	Mostly file-level; heading/block granularity only via plugins.	Primarily document-level; chunking depends on collection config.
Search modes	Built-in hybrid (RRF), semantic, and lexical modes via flags/API.	Keyword search + links; semantic only through plugins.	BM25, semantic, hybrid with LLM rerank.
Agent/LLM friendliness	Explicit agent focus: JSON everywhere, link-boosting, time-decay memory, Rust library API.	GUI-centric; agent use via ad-hoc plugins/integrations.	Built for CLI/skills; good for agents needing local semantic search.
Links & structure	Parses wikilinks/links into a graph; links, backlinks, orphans, link-aware re-ranking.	Strong backlinks/graph UI for humans; ranking not link-aware by default.	Treats files as docs; link structure not a primary concern.
Recency handling	Built-in exponential time decay with configurable half-life flags.	No formal time-decay; recency mostly via daily notes/sorting.	Quality via BM25/vector; no dedicated time-decay memory model.
Interface style	CLI-first and decoupled; desktop app possible but not required.	Desktop app (Win/macOS/Linux) with plugins.	Pure CLI and programmatic use.
Index lifecycle	mdvdb ingest/watch, preview, diagnostics; one portable index file per knowledge base.	Internal to app; vector index only if plugins add it.	qmd manages collections and embeddings locally, no extra infra.

Project Status

Development. Wait until 0.2.0 is released for actual use.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 747 Commits
.auto-claude/specs		.auto-claude/specs
.claude		.claude
.github		.github
app @ e18df0a		app @ e18df0a
benchmark @ fa8cfc0		benchmark @ fa8cfc0
docs		docs
skills @ ccca40c		skills @ ccca40c
src		src
test-vault		test-vault
tests		tests
web-component @ b09a737		web-component @ b09a737
.gitignore		.gitignore
.gitmodules		.gitmodules
.markdownvdb.example		.markdownvdb.example
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Cross.toml		Cross.toml
LICENSE		LICENSE
PROJECT.md		PROJECT.md
README.md		README.md
RELEASING.md		RELEASING.md
TECH.md		TECH.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Markdown Vector Database (mdvdb)

Built For

Claude Code Skills

Features

Quick Start

Installation

Configuration

CLI Commands

Library Usage

How It Works

Architecture

Comparison

Project Status

License

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Markdown Vector Database (mdvdb)

Built For

Claude Code Skills

Features

Quick Start

Installation

Configuration

CLI Commands

Library Usage

How It Works

Architecture

Comparison

Project Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages