Skip to content

jrollin/cartog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

683 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Cartog

CI codecov Crates.io Crates.io downloads GitHub stars License: MIT

Map your codebase. Navigate by graph, not grep.

Semantic search that returns named symbols, not text chunks — ranked, reranked, and budget-aware.

~280 tokens per query vs ~1,700 for grep+read · 97% recall · 8 µs–20 ms structural-query latency · 15 languages + 4 frameworks.

Cartog pre-computes a code graph — symbols, and the calls, imports, and inheritance between them — so you can query structure instantly instead of grepping for text. Ask "who calls this?", "what breaks if I change it?", or "find the auth logic" and get a ranked, structured answer: an exact function with its signature and span, not a file-and-line guess you have to open and read.

Use it from the CLI for day-to-day navigation, or as an MCP server so AI agents query the graph instead of flooding their context with raw file dumps — at a fraction of the token cost. One static binary, one SQLite file. No Python, no pip, no Docker, no cloud: 100% local by default.

Documentation site

cartog demo

Contents

Quick Start

curl -fsSL https://www.cartog.dev/install.sh | sh   # or: cargo install cartog
cd your-project
cartog init                   # 1. scaffold .cartog.toml
cartog index                  # 2. build the code graph

That's it for CLI use. Two commands.

If you want MCP wired into your editor (Claude Code, Cursor, VS Code, Claude Desktop, Codex CLI, Gemini CLI, OpenCode, Windsurf, Zed, Antigravity, Kiro, Hermes Agent), add one more:

cartog ide                    # optional — only if you want editor integration

All three commands are idempotent.

Now query:

cartog search validate        # find symbols by name         (sub-ms)
cartog refs validate_token    # who calls this?              (< 500 µs)
cartog impact validate_token  # what breaks if I change it?  (< 20 ms)
cartog outline src/auth.py    # file structure, no cat       (< 15 µs)

Teach your agent to use cartog

Wiring MCP is half the job. The other half is telling the agent when to prefer cartog over grep + read. Drop the snippet from docs/agent-snippet.md into your project's AGENTS.md, CLAUDE.md, .cursor/rules/, or equivalent, and the agent will route "where is X?" / "who calls X?" / "what breaks if I change X?" through cartog's 16 MCP tools instead of flooding context with raw text.

Why Cartog

Every code navigation tool makes you choose: fast but shallow (grep), or precise but slow (language servers). Cartog gives you both.

grep / cat / find Language servers Cartog
Query speed depends on codebase size seconds to start 8-450 µs
Transitive analysis impossible partial impact --depth 5
Setup none per-language config one binary, zero config
Languages all (text) one per server 15 languages, one tool
Token cost (LLM context) ~1,700 tokens/query n/a ~280 tokens/query
Recall (completeness) 78% ~100% 97% *
Privacy local local 100% local

Measured across 13 scenarios, 10 languages (benchmark suite).

* 97 % recall requires a matching language server on PATH. The default build ships LSP support; heuristic-only resolution (no server found, or --no-lsp) lands around 25–37 %, with specifics varying by language.

What You Get

Fast structural queries

Pre-computed graph means no re-reading files, no multi-step discovery.

cartog search parse              # symbol name lookup (sub-ms)
cartog refs UserService          # all callers, importers, inheritors
cartog callees authenticate      # what does this function call?
cartog impact SessionManager     # blast radius — callers-of-callers, depth N
cartog hierarchy BaseService     # inheritance tree
cartog deps src/routes/auth.py   # file-level imports
cartog changes --commits 5       # symbols affected by recent git commits
cartog map --tokens 4000         # codebase overview, ranked by centrality

Semantic search (optional, still fully local)

cartog rag setup                 # download models (~1.2 GB, one-time)
cartog rag index .               # embed symbols + docs into sqlite-vec
cartog rag search "authentication token validation"

Three-tier hybrid pipeline: FTS5 keyword + vector KNN + cross-encoder re-ranking. Indexes both code (functions, classes, methods) and Markdown documents. Models run locally via ONNX Runtime — no API keys, no network calls.

Prefer Ollama or a hosted endpoint? Set provider = "ollama" or provider = "openai" (any OpenAI-compatible /v1 endpoint) in .cartog.toml. See Configuration.

Live index

cartog watch .                   # auto re-index on file changes
cartog watch . --rag             # also re-embed (deferred, non-blocking)

MCP server for AI agents

cartog serve                     # 16 tools over stdio
cartog serve --watch --rag       # with live re-indexing + semantic search

Works with Claude Code, Cursor, Windsurf, Zed, OpenCode — any MCP client. Tool output is compact by default (locations + signatures + snippet previews, not full bodies) so it stays within an agent's context budget; set CARTOG_MCP_COMPACT=0 to restore full bodies. On the CLI, cartog --json --compact … does the same — ~60% smaller payloads that still carry symbols, signatures, and scores.

LSP precision, built in

Cartog auto-detects language servers on PATH (rust-analyzer, pyright, typescript-language-server, gopls, ruby-lsp, solargraph, jdtls, intelephense, dart, sourcekit-lsp, kotlin-language-server, vue-language-server, svelteserver, astro-ls) and uses them to boost edge resolution from ~25% to up to 81%. Enabled by default; results persist in SQLite — pay the cost once. Disable at runtime with --no-lsp, or omit at build time with cargo install cartog --no-default-features.

Install

Install script (macOS / Linux, no Rust required)

curl -fsSL https://www.cartog.dev/install.sh | sh

Detects your OS + architecture, downloads the matching binary from the latest GitHub Release, verifies its SHA-256, and installs to /usr/local/bin (or ~/.local/bin if non-root). Override with CARTOG_INSTALL_DIR; pin a version with CARTOG_VERSION=<version> (e.g. the tag from Releases). Audit the script: site/public/install.sh.

From crates.io (Rust toolchain required)

cargo install cartog                                  # default: LSP + S3 sync + Ollama + OpenAI providers
cargo install cartog --no-default-features            # minimal: drops LSP, S3, Ollama, OpenAI
cargo install cartog --no-default-features --features lsp  # LSP only

Pre-built binaries (manual)

# macOS (Apple Silicon)
curl -L https://github.com/jrollin/cartog/releases/latest/download/cartog-aarch64-apple-darwin.tar.gz | tar xz
sudo mv cartog /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/jrollin/cartog/releases/latest/download/cartog-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv cartog /usr/local/bin/

# Linux (ARM64)
curl -L https://github.com/jrollin/cartog/releases/latest/download/cartog-aarch64-unknown-linux-gnu.tar.gz | tar xz
sudo mv cartog /usr/local/bin/

# Windows (x86_64) — download .zip from releases page

Upgrade

Once cartog is on your PATH:

cartog self update           # upgrade in place to the latest stable
cartog self update --check   # report whether an update exists; exit 1 if outdated
cartog self version          # show installed version + last-check timestamp
cartog self rollback         # restore the previous binary

Cargo-installed binaries upgrade with cargo install cartog --force. See docs/updates.md for env vars, exit codes, and the state file location.

Agent integration: which path?

Three setup paths for agents and editors. Pick the one that matches your stack — they are alternatives, not steps.

Path Use it when What you get
cartog ide You want MCP wired into one or more editors (Claude Code, Cursor, VS Code, Codex CLI, Gemini CLI, Claude Desktop, OpenCode, Windsurf, Zed, Antigravity, Kiro, Hermes Agent). MCP entries written to the right files; interactive picker if you run it without flags.
Claude Code plugin You are on Claude Code and want install + skill + MCP wired in one step. Bundled: binary install, behavioural skill, MCP server, all preconfigured.
Agent skill You use an agent that follows the skills protocol (Cursor, Copilot, others) and only need the behavioural rules, not MCP. Skill files installed into the agent's skill directory; works alongside any install method.

Claude Code plugin

Run these two commands one at a time in Claude Code:

/plugin marketplace add jrollin/cartog
/plugin install cartog@cartog-plugins

First session: if the cartog binary is not already on your PATH, the plugin starts a background install and prints a one-line notice. The cartog MCP server cannot start in this first session because the binary lands after Claude Code has already tried to spawn it.

Second session: restart Claude Code. The MCP server starts, code-graph tools become available, and the SessionStart hook keeps the index fresh on every subsequent session.

Repair or upgrade: type /cartog-install at any time to install the binary synchronously (e.g. to retry a failed background install), or to upgrade an existing install to match the plugin's pinned version. The skill at skills/cartog-install/ handles both cases.

Offline / vetted install: the manual fallback is the same one used by the curl one-liner at the top of this section: download site/public/install.sh (served at https://www.cartog.dev/install.sh), inspect it, then run it.

Agent Skill (Cursor, Copilot, others)

npx skills add jrollin/cartog

Why Not...

grep/ripgrep? Great for string literals and config values. But grep can't trace call chains, can't do transitive impact analysis, and floods your context with raw text. Cartog returns structured, ranked, deduplicated results — one refs call replaces 6+ discovery steps.

A language server? LSPs give perfect precision but require per-language setup, take seconds to start, and only cover one language at a time. Cartog covers 15 languages with one binary and answers in microseconds. When you need LSP precision, cartog can use it as an optional layer.

Python-based graph tools? They solve a similar problem but require a Python runtime, pip dependencies, and virtual environments. Cartog is a single static binary — download and run. It also queries 10-100x faster thanks to compiled Rust + SQLite.

An embedding-search tool (chunk + vector)? Tools that chunk files and embed them find code by concept, but return file-and-line windows — you still open and read to learn what matched. Cartog returns the named symbol (function/class/method) with its kind, signature, and exact span, then sharpens ranking with a cross-encoder re-ranker and centrality that chunk-only tools lack. It also embeds in-process (local ONNX, zero external server — no Ollama daemon to keep running) and ships LSP-precise call/import edges, so the same query that finds code can also trace it. Add --compact and the JSON stays symbol-level while dropping ~60% of the bytes.

MCP Server Setup

Fastest path: let cartog write the right config for your editor.

cartog ide                                  # all installed clients, all scopes
cartog ide --client cursor                  # one client
cartog ide --client claude-desktop --dry-run  # preview without writing

Idempotent. Existing servers in each file are preserved.

Prefer the brew/npm shape? cartog install takes editors as positional args and is always non-interactive — safer than cartog ide for scripts and agents:

cartog install cursor                 # one editor
cartog install cursor vscode codex    # several at once
cartog install                        # all detected editors
cartog install cursor --dry-run       # preview

Prefer to wire it yourself? Pick your client below.

Claude Code — project-scoped .mcp.json or user settings

One-shot:

cartog ide --client claude-code             # writes .mcp.json + user settings
claude mcp add cartog -- cartog serve --watch       # user scope
claude mcp add --scope project cartog -- cartog serve --watch

Manual (<repo>/.mcp.json):

{
  "mcpServers": {
    "cartog": { "command": "cartog", "args": ["serve", "--watch"] }
  }
}

Only Claude Code gets --watch by default — the others ship plain ["serve"]. Agent-driven flows churn files faster than human-driven editor flows, so the in-process file watcher pays off. Drop --watch with cartog ide --no-watch if you don't want it.

Cursor — project .cursor/mcp.json or user settings

One-shot:

cartog ide --client cursor

Manual:

{
  "mcpServers": {
    "cartog": { "command": "cartog", "args": ["serve"] }
  }
}
Codex CLI — user-only TOML at ~/.codex/config.toml

One-shot:

cartog ide --client codex

Manual:

[mcp_servers.cartog]
command = "cartog"
args = ["serve"]

Codex is user-global only. If you use cartog on multiple projects, cartog ide auto-names each section cartog-<slug>-<hash8> so they coexist.

Windsurf~/.codeium/windsurf/mcp_config.json
cartog ide --client windsurf
{
  "mcpServers": {
    "cartog": { "command": "cartog", "args": ["serve"] }
  }
}
VS Code (GitHub Copilot) — project .vscode/mcp.json
cartog ide --client vscode

Note: VS Code's top-level key is servers (no mcp prefix):

{
  "servers": {
    "cartog": { "type": "stdio", "command": "cartog", "args": ["serve"] }
  }
}
Zed~/.config/zed/settings.json
cartog ide --client zed
{
  "context_servers": {
    "cartog": { "command": "cartog", "args": ["serve"] }
  }
}
OpenCode~/.config/opencode/opencode.json
cartog ide --client opencode
{
  "mcp": {
    "cartog": {
      "type": "local",
      "command": ["cartog", "serve"],
      "enabled": true
    }
  }
}
Gemini CLI~/.gemini/settings.json
cartog ide --client gemini
{
  "mcpServers": {
    "cartog": { "command": "cartog", "args": ["serve"] }
  }
}
Claude Desktopclaude_desktop_config.json
cartog ide --client claude-desktop

Manual (macOS: ~/Library/Application Support/Claude/; Windows: %APPDATA%\Claude\):

{
  "mcpServers": {
    "cartog": { "command": "cartog", "args": ["serve"] }
  }
}

Restart Claude Desktop after editing.

See docs/mcp-setup.md for the canonical long-form reference, including the path-naming scheme for Codex's multi-project setup, and docs/usage.md for all cartog ide flags.

Commands

# Index
cartog index .                              # build the graph (with LSP if available)
cartog index . --no-lsp                     # heuristic-only (~1-4s)
cartog index . --force                      # re-index all files

# Search
cartog search validate                      # partial name match (sub-ms)
cartog search validate --kind function      # filter by kind
cartog rag search "token validation"        # semantic search (natural language)

# Navigate
cartog outline src/auth/tokens.py           # file structure without reading it
cartog refs validate_token                  # who references this?
cartog refs validate_token --kind calls     # only call sites
cartog callees authenticate                 # what does this call?
cartog impact SessionManager --depth 3      # what breaks if I change this?
cartog hierarchy BaseService                # inheritance tree
cartog hierarchy BaseService --mermaid      # paste-into-PR diagram
cartog deps src/routes/auth.py              # file-level imports
cartog deps src/routes/auth.py --mermaid    # graph LR with file as root

# Inspect
cartog stats                                # index summary
cartog savings                              # tokens saved vs grep+read baseline
cartog map --tokens 4000                    # codebase overview by centrality
cartog map --mermaid                        # codebase map as graph TD
cartog changes --commits 5                  # recently changed symbols
cartog doctor                               # environment health check

# Watch & Serve
cartog watch .                              # auto re-index on save
cartog serve --watch --rag                  # MCP server with live index

All commands support --json for structured output and --tokens N for budget-aware output.

Example outputs

outline

$ cartog outline auth/tokens.py
from datetime import datetime, timedelta  L3
from typing import Optional  L4
import hashlib  L5
class TokenError  L11-14
class ExpiredTokenError  L17-20
function generate_token(user: User, expires_in: int = 3600) -> str  L23-27
function validate_token(token: str) -> Optional[User]  L30-44
function lookup_session(token: str) -> Optional[Session]  L47-49
function refresh_token(old_token: str) -> str  L52-56
function revoke_token(token: str) -> bool  L59-65

search

$ cartog search validate
function  validate_token    auth/tokens.py:30
function  validate_session  auth/tokens.py:68
function  validate_user     services/user.py:12

impact

$ cartog impact validate_token --depth 3
  calls  get_current_user  auth/service.py:40
  calls  refresh_token  auth/tokens.py:54
    calls  impersonate  auth/service.py:52

refs

$ cartog refs UserService
imports  ./service  routes/auth.py:3
calls    login  routes/auth.py:15
inherits AdminService  auth/service.py:47
references  process  routes/auth.py:22

Supported Languages

Language Extensions Symbols Edges
Python .py, .pyi functions, classes, methods, imports, variables calls, imports, inherits, raises, type refs
TypeScript .ts, .tsx functions, classes, methods, imports, variables calls, imports, inherits, type refs, new, JSX component usage
JavaScript .js, .jsx, .mjs, .cjs functions, classes, methods, imports, variables calls, imports, inherits, new, JSX component usage
Rust .rs functions, structs, traits, impls, imports calls, imports, inherits (trait impl), type refs
Go .go functions, structs, interfaces, imports calls, imports, type refs
Ruby .rb functions, classes, modules, imports calls, imports, inherits, raises, rescue types
Java .java classes, interfaces, enums, methods, imports calls, imports, inherits, raises, type refs, new
PHP .php classes, interfaces, traits, methods, functions calls, inherits, implements, references (traits, new)
Dart .dart classes, mixins, extensions, enums, methods, functions, typedefs calls, imports, inherits, implements, type refs
Swift .swift classes, structs, actors, protocols, enums, extensions, methods, functions, typealiases calls, imports, inherits, implements, type refs
Kotlin .kt, .kts classes, data/sealed classes, interfaces, enums, objects, methods, functions, typealiases calls, imports, inherits, implements, type refs
Vue .vue <script> symbols (functions, classes, variables, imports) calls, imports, inherits, type refs, JSX component usage
Svelte .svelte <script> symbols (functions, classes, variables, imports) calls, imports, inherits, type refs, JSX component usage
Astro .astro frontmatter + <script> symbols (functions, classes, variables, imports) calls, imports, inherits, type refs, JSX component usage
Markdown .md document sections (chunked by heading)

How It Works

graph LR
    A["Source files<br/>(py, ts, rs, go, rb, java, php, dart, md)"] -->|parse| B["Symbols + Edges"]
    B -->|write| C[".cartog/db.sqlite<br/>(SQLite)"]
    C -->|query| D["search / refs / impact<br/>outline / callees / hierarchy"]
    C -->|embed locally| E["ONNX embeddings<br/>(sqlite-vec)"]
    E -->|query| F["rag search<br/>(FTS5 + vector KNN + reranker)"]
Loading
  1. Index — tree-sitter parses your code, extracts symbols (functions, classes, methods) and edges (calls, imports, inherits, type refs). Markdown is chunked by heading.
  2. Store — everything goes into a local .cartog/db.sqlite SQLite file.
  3. Resolve (heuristic) — links edges by name with scope-aware matching.
  4. Resolve (LSP, optional) — sends unresolved edges to language servers for compiler-grade precision. Results persist.
  5. Embed (optional) — generates vector embeddings via local ONNX or Ollama, stored in sqlite-vec.
  6. Query — instant lookups against the pre-computed graph. Hybrid FTS5 + vector search with RRF merge and cross-encoder re-ranking.

Re-indexing is incremental: git diff + SHA-256 skips unchanged files, Merkle-tree diffing updates only modified symbols. cartog watch automates this on file save.

Performance

Indexing: 69 files / 4k LOC in 95ms (incremental re-index skips unchanged files).

Query Latency
outline 8-14 µs
hierarchy 8-9 µs
deps 25 µs
stats 32 µs
search 81-102 µs
callees 177-180 µs
refs 258-471 µs
impact (depth 3) 2.7-17 ms

Edge Resolution

Project Language Heuristic With LSP LSP time
TS microservice (230 files) TypeScript 37% 81% 13s
Vue.js SPA (739 files) Vue/TS/JS 31% 72% 25s
Rust CLI (358 files) Rust 25% 44% 72s
PHP webapp fixture (25 files) PHP 82% 84% 22s

Unresolved edges are mostly calls to external libraries outside the project boundary. The PHP row uses the self-contained webapp_php benchmark fixture (no composer install), so LSP gains are modest; real PHP projects with vendor/ populated typically see larger lifts from intelephense.

Configuration

Database path is resolved automatically — no config needed for standard use:

  1. --db flag / CARTOG_DB env var (highest priority)
  2. .cartog.toml at git root
  3. Auto git-root detection (.cartog/db.sqlite; legacy .cartog.db still read with a warning)
  4. .cartog/db.sqlite in current directory

.cartog.toml (optional):

[database]
path = "~/.local/share/cartog/myproject.db"

[embedding]
provider = "ollama"          # "local" (default), "ollama", or "openai"
model = "nomic-embed-text"
# auto_embed = true          # watcher auto-embed; unset = auto-detect (embed if the
                             # repo already has embeddings). CARTOG_WATCH_RAG > this > --rag

[embedding.ollama]
base_url = "http://localhost:11434"

# Or an OpenAI-compatible /v1 endpoint (OpenAI, Mistral, Voyage, Jina, OVHcloud,
# or a local server like Ollama /v1, LM Studio, vLLM):
# [embedding]
# provider = "openai"
# model    = "text-embedding-3-small"
# [embedding.openai]
# base_url    = "https://api.openai.com/v1"  # swap base_url to change vendor
# api_key_env = "OPENAI_API_KEY"             # env var NAME; never the key itself

[reranker]
provider = "none"            # "local" (default) or "none"
# model  = "BAAI/bge-reranker-base"   # default: jinaai/jina-reranker-v1-turbo-en (~150MB)

# [index]
# Extra repo-root-relative globs to skip (on top of node_modules/target/vendor/...).
# Matched directories are pruned. Useful for vendored or generated trees.
# exclude = ["vendor/**", "**/*.generated.*"]

Privacy

  • Parsing: tree-sitter runs in-process
  • Storage: SQLite file in your project directory
  • Embeddings: local ONNX, Ollama on localhost, or an OpenAI-compatible endpoint you configure (API key from env, never from config)
  • Re-ranking: cross-encoder runs locally via ONNX
  • MCP server: stdio only, no network sockets
  • No telemetry, no analytics, no phone-home

Your code never leaves your machine — unless you explicitly opt in to S3-compatible index sync (cartog push / cartog pull), which is inert until you configure a remote.

Troubleshooting

Symptom Fix
"not initialized" / no results Run cartog init then cartog index . in the repo first.
cartog index seems to hang Cold index of a large repo takes a few seconds; re-run with RUST_LOG=info cartog index . if nothing after 60s.
MCP "Connection closed" on a 2nd editor window Expected: single-writer election makes the 2nd instance read-only (14 of 16 tools). Ensure cartog --version ≥ 0.17 and CARTOG_SINGLE_WRITER is unset.
.cartog.toml ignored Cartog walks up to the git root; with no .git, put it in the cwd or pass --db. cartog config prints the resolved paths.
Missing symbols / recall lower than expected Wait for the watcher (or run cartog index), check the file's language is supported and not .gitignored. Install a language server on PATH to lift edge resolution from ~25% to up to ~81%.
Anything else cartog doctor checks git, config, DB, and models.

Full list with detailed fixes: docs/troubleshooting.md.

Documentation

Full index: docs/README.md. Highlights:

Articles

A series on how cartog works, from the code graph to semantic search (English · Français):

  1. Tree-sitter and code graphs: navigating code better than grep — parsing code into a SQLite symbol graph, cutting the tokens an agent burns versus grep's line-by-line scans. 🇬🇧 EN · 🇫🇷 FR
  2. Semantic code search with RAG and ONNX — local BGE-small embeddings, FTS5 + vector KNN fused with reciprocal rank fusion, then cross-encoder re-ranking, all on CPU. 🇬🇧 EN · 🇫🇷 FR
  3. Language Server Protocol to sharpen a code graph — tapping language servers like rust-analyzer and pyright to lift edge-resolution precision from 25–37% to 44–81%. 🇬🇧 EN · 🇫🇷 FR
  4. Incremental indexing with a Merkle tree — Merkle hashing plus multi-level file filtering to re-index only changed symbols, dropping re-index time from seconds to milliseconds. 🇬🇧 EN · 🇫🇷 FR
  5. Exposing a code graph as an MCP server — wiring cartog's 16 tools into the Model Context Protocol so AI agents query structured graph results instead of flooding context with raw files. 🇬🇧 EN · 🇫🇷 FR

Contributors

Thanks to everyone who has contributed to cartog.

Contributors

See CONTRIBUTING.md for setup, commit style, and how to add a new language extractor.

License

MIT

About

Pre-indexed code knowledge graph via treesitter, enchanced query with RAG, MCP server, private and local storage. Agent ready.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors