Knowledge Vault

A local, LLM-powered knowledge base that ingests from Zotero, academic databases, and the web.
Pull papers from your Zotero library. Batch-search PubMed, arXiv, Scholar, Consensus. Compile a cross-referenced wiki. Query your knowledge. Browse in Obsidian.

Built on ideas from Andrej Karpathy's LLM knowledge base approach, the agno-agi/pal architecture, and the farzaa/wiki pattern.

What It Does

Knowledge Vault is a Claude Code plugin that turns any project directory into a structured knowledge base. It batch-ingests from your Zotero library and from academic databases (PubMed, arXiv, Scholar Gateway, Consensus, Paper Search) via MCP servers — then compiles the raw sources into a cross-referenced wiki you can query.

flowchart LR
    subgraph collect ["Academic Collection"]
        P["PubMed"]
        X["arXiv"]
        S["Scholar Gateway"]
        CO["Consensus"]
        PS["Paper Search\n(14 databases)"]
    end
    Z["Zotero\nLibrary"]
    collect -->|"/knowledge-vault:collect"| R["raw/\n.manifest.json\nsources.json"]
    Z -->|"/knowledge-vault:ingest-zotero"| R
    A["URLs, files,\nnotes, clips"] -->|"/knowledge-vault:ingest"| R
    CL["Obsidian\nWeb Clipper"] -->|auto| CLP["Clippings/"]
    CLP -->|"/knowledge-vault:process"| R
    R -->|"/knowledge-vault:compile"| W["wiki/\nsummaries/\nconcepts/\nindex.md"]
    W -->|"/knowledge-vault:query"| ANS["Grounded\nAnswers"]
    W -->|"/knowledge-vault:lint"| H["Health\nReport"]
    W -->|browse| O["Obsidian\nGraph View"]

Claude maintains all wiki content. You browse and query -- never edit directly.

Install

New install

Step 1 — Add the marketplace (one time only):

/plugin marketplace add psypeal/claude-knowledge-vault

Step 2 — Install the plugin:

/plugin install knowledge-vault@claude-knowledge-vault

Step 3 — Reload:

/reload-plugins

No config, no dependencies, no API keys.

Update

When a new version is released, refresh the marketplace to pull the latest:

/plugin marketplace update claude-knowledge-vault

Then reload so the new commands, scripts, and fixes take effect:

/reload-plugins

If auto-update is enabled for this marketplace, the plugin updates automatically during the marketplace refresh. Otherwise, toggle it via /plugin → Marketplaces → select claude-knowledge-vault → Enable auto-update.

Uninstall

/plugin uninstall knowledge-vault@claude-knowledge-vault

To uninstall from a specific scope, use the /plugin Installed tab — select the plugin and choose Uninstall.

Migrating from v1 (skill)

Existing vaults are untouched — the .vault/ directory format is unchanged.

# 1. Remove the old skill
rm -rf ~/.claude/skills/knowledge-vault

Then in Claude Code:

/plugin marketplace add psypeal/claude-knowledge-vault

/plugin install knowledge-vault@claude-knowledge-vault

/reload-plugins

# 3. Done — your existing .vault/ directories work as-is

See Migration for full details.

Quick Start

> /knowledge-vault:init
  Vault initialized at .vault/

  Let me configure your vault preferences.

  What domain is this vault for?
> Neuroimaging and neurodegeneration research

  Preferences saved to .vault/preferences.md

  Tip: Run /knowledge-vault:setup-sources to configure academic databases.

> /knowledge-vault:setup-sources
  Detected:
    PubMed (Claude.ai built-in)         active
    Scholar Gateway (Claude.ai built-in) active

  Available to add:
    Consensus       claude mcp add --transport http consensus https://mcp.consensus.app/mcp
    arXiv           claude mcp add arxiv-mcp-server -- uvx arxiv-mcp-server ...
    Paper Search    claude mcp add paper-search -- npx -y paper-search-mcp-nodejs

  Which servers would you like to add?
> Consensus and arXiv

  Added 2 servers. Sources saved to .vault/sources.json

> /knowledge-vault:collect tau PET imaging neurodegeneration --since 2023
  Searching PubMed, Scholar Gateway, Consensus, arXiv...

  | # | Title                                         | Source   | Date | Type   |
  |---|-----------------------------------------------|----------|------|--------|
  | 1 | Tau PET imaging in early Alzheimer's disease   | PubMed   | 2024 | paper  |
  | 2 | Longitudinal tau accumulation in subcortical... | Consensus| 2023 | paper  |
  | 3 | Second-generation tau tracers: a review         | arXiv    | 2024 | review |

  Which to ingest? (all / 1,3 / none)
> all

  Ingested 3 sources. 3 pending compilation.

> /knowledge-vault:compile
  Compiled 3 sources. Extracted 7 concepts:
  tau-pet-imaging, neurodegeneration, alzheimers-disease, tau-tracers,
  subcortical-tau, longitudinal-imaging, amyloid-tau-interaction

> /knowledge-vault:query What is the current evidence for second-generation tau tracers?
  Based on the vault: Second-generation tau PET tracers (e.g., [18F]MK-6240,
  [18F]PI-2620) show improved off-target binding profiles compared to
  first-generation [18F]AV-1451. Three vault sources report higher specificity
  for neurofibrillary tau in Braak stages III-IV...
  Sources: [[tau-tracers]], [[tau-pet-imaging]]

Commands

Command	Description
`/knowledge-vault:init`	Initialize a `.vault/` knowledge base in the current project
`/knowledge-vault:ingest <source>`	Add a raw source -- URL, pasted text, or file path
`/knowledge-vault:ingest-zotero <collection>`	Batch ingest papers from a Zotero collection (metadata, fulltext, annotations)
`/knowledge-vault:collect <query>`	Batch search academic databases and selectively ingest results
`/knowledge-vault:setup-sources`	Configure research MCP servers for academic collection
`/knowledge-vault:compile`	Compile pending sources into wiki summaries and concept articles
`/knowledge-vault:lint`	Run 8 health checks on the wiki
`/knowledge-vault:cleanup`	Audit and actively fix article quality issues
`/knowledge-vault:query <question>`	Ask a question grounded in your vault's knowledge
`/knowledge-vault:process`	Batch: ingest all web clips + compile everything
`/knowledge-vault:status`	Print a quick status summary
`/knowledge-vault:agent-reset`	Clear learned retrieval patterns and start fresh

Academic Collection

The headline feature of v2. /knowledge-vault:collect searches multiple academic databases in parallel and lets you cherry-pick which results to ingest.

Supported servers

Server	Type	Setup	Databases
PubMed	Claude.ai built-in	No setup needed	PubMed, PMC
Scholar Gateway	Claude.ai built-in	No setup needed	Broad academic literature
Consensus	HTTP MCP	`claude mcp add --transport http consensus https://mcp.consensus.app/mcp`	Research consensus engine
arXiv	stdio MCP	`claude mcp add arxiv-mcp-server -- uvx arxiv-mcp-server --storage-path .vault/raw/arxiv-papers`	arXiv preprints
Paper Search	stdio MCP	`claude mcp add paper-search -- npx -y paper-search-mcp-nodejs`	14 databases: arXiv, PubMed, Semantic Scholar, bioRxiv, medRxiv, Crossref, CORE, OpenAlex, DOAJ, Europe PMC, Internet Archive Scholar, Fatcat, BASE, DBLP
Zotero	stdio MCP	`uv tool install zotero-mcp-server && zotero-mcp setup`	Your local Zotero library — collections, metadata, PDF fulltext, annotations

How it works

/knowledge-vault:setup-sources detects what you already have configured and shows what else is available. You approve each server individually.
/knowledge-vault:collect <query> searches all enabled servers in parallel, deduplicates results, and presents a numbered table.
You pick which results to ingest -- all, specific numbers (1,3,5), or filters (only 2024+).
Selected papers are ingested to raw/ with full metadata and available text.

The system is elastic and user-controlled. No server is added without your approval. No paper is ingested without your selection.

Add your own MCP servers

The 5 servers above are pre-configured suggestions, but you can add any MCP server as a research source. Just two steps:

Add the server using claude mcp add:

claude mcp add my-server -- npx -y my-mcp-package
# or for HTTP servers:
claude mcp add --transport http my-server https://example.com/mcp

Register it in your vault by editing .vault/sources.json:

{
  "id": "my-server",
  "name": "My Custom Server",
  "type": "stdio",
  "enabled": true,
  "tools": ["mcp__my-server__search"]
}

Once registered, /knowledge-vault:collect will include your custom server in batch searches alongside the built-in ones.

Collect options

/knowledge-vault:collect transformers attention mechanisms          # basic search
/knowledge-vault:collect tau PET imaging --since 2023              # papers from 2023 onward
/knowledge-vault:collect CRISPR delivery --count 5                 # max 5 results per source
/knowledge-vault:collect meta-analysis sleep cognition --type review  # filter by type

Zotero Integration

/knowledge-vault:ingest-zotero <collection> batch-imports papers from your local Zotero library — metadata, PDF fulltext, and your highlighted annotations — and drops them into the vault's raw/ directory ready for compilation.

Setup

Install 54yyyu/zotero-mcp once:

uv tool install zotero-mcp-server && zotero-mcp setup

Then make sure Zotero is running with local API enabled (Zotero 7+: Settings → Advanced → Allow other applications on this computer to communicate with Zotero).

How it works

Zotero collection  ──▶  list items  ──▶  you pick which to ingest
       │
       ▼
  For each paper:
    - metadata (title, authors, year, DOI, BetterBibTeX citekey, Zotero key)
    - PDF fulltext (if attached)
    - your highlighted annotations (if any)
       │
       ▼
  Structured extraction  ──▶  raw/<slug>.md  ──▶  /knowledge-vault:compile
  (~800-1200 words, not the full PDF — the full paper stays in Zotero)

Usage

> /knowledge-vault:ingest-zotero hippocampus-review-2024

  Found collection: "Hippocampus Review 2024" (12 items)

  | # | Title                                            | Authors           | Year | Type     |
  |---|--------------------------------------------------|-------------------|------|----------|
  | 1 | Place cell remapping in CA1 after sleep          | Tanaka et al.     | 2024 | paper    |
  | 2 | Entorhinal grid cells and path integration       | Rowland & Moser   | 2023 | paper    |
  | 3 | Hippocampal theta rhythm: a 50-year review       | Buzsáki           | 2024 | review   |
  ...

  Ingest which? (e.g., 1,3,5 or all)
> all

  Ingested 12 papers. Run /knowledge-vault:compile now?

What gets preserved

Each raw file gains extra Zotero-specific frontmatter fields so you can trace back to the source:

---
title: "Place cell remapping in CA1 after sleep"
source: "https://doi.org/10.1038/s41593-024-xxxxx"
type: paper
zotero_key: "ABCD1234"
citekey: "tanaka2024place"
doi: "10.1038/s41593-024-xxxxx"
year: 2024
authors: ["Tanaka K", "Moser EI", "..."]
compiled: false
---

Re-running the command is safe — existing slugs are skipped, so you can incrementally pull new items as you add them to Zotero.

Project Structure

After /knowledge-vault:init and /knowledge-vault:setup-sources:

your-project/
  .vault/
  ├── preferences.md       User preferences (interview-generated)
  ├── agent.md             Learned retrieval intelligence (auto-maintained)
  ├── sources.json         Configured research MCP servers
  ├── Clippings/           Obsidian Web Clipper default folder
  ├── raw/                 Ingested sources with YAML frontmatter
  │   ├── .manifest.json   Source registry
  │   └── arxiv-papers/    arXiv PDFs (if arXiv server configured)
  ├── wiki/
  │   ├── index.md         Master routing index
  │   ├── _backlinks.json  Reverse link index
  │   ├── concepts/        One article per topic
  │   ├── summaries/       One summary per source
  │   ├── outputs/         Query results and lint reports
  │   └── .state.json      Compilation and lint state
  └── templates/           Frontmatter skeletons

Personalized Preferences

During /knowledge-vault:init, Claude interviews you about your vault's domain and priorities:

> /knowledge-vault:init
  Vault initialized at .vault/

  Let me configure your vault preferences.

  What domain is this vault for?
> Biomedical research -- neuroimaging and neurodegeneration

  What sources will you mainly use?
> Papers from PubMed, review articles, and meeting notes

  Any priority rules for sources?
> Peer-reviewed > preprints > blog posts. Prioritize longitudinal studies.

  How granular should concepts be?
> Balanced -- not too broad, not too narrow

  Any special compilation instructions?
> Always extract methodology and sample size. Note statistical methods used.

  Preferences saved to .vault/preferences.md

This creates .vault/preferences.md -- Claude reads it at the start of every vault operation. It shapes how sources are summarized, which concepts are extracted, and how queries are answered.

You can edit preferences.md manually anytime. Claude always picks up the latest version.

3-Tier Query Routing

Queries stay efficient at any vault size. Claude never loads everything -- it reads the index, picks what's relevant, and drills down only when needed.

Tier 1  ─────  wiki/index.md           Always read first (one-line per entry)
                    │
Tier 2  ─────  summaries/ + concepts/  Read relevant matches (200-500 words each)
                    │
Tier 3  ─────  raw/                    Full source text (only when depth needed)

Compounding Knowledge

Queries answer questions from the vault. When an answer is particularly valuable, you can choose to save it back into the vault, enriching future queries.

How it works

Query: Claude reads wiki/index.md, picks 2-4 relevant articles, and answers your question with [[wikilinks]] to sources.
File it: If the answer is worth keeping, say "file it". Claude saves it to wiki/outputs/ and updates the index. Filed answers become available to future queries.
Leave it: Most queries just return an answer and nothing is saved. Simple lookups pass through without adding noise.

Filing is always user-initiated -- Claude does not automatically classify or save answers.

Connection strength

When you file an answer that connects multiple concepts, the connection gets a strength rating:

Strength	Criteria	Graph impact
Strong	Supported by 2+ independent sources with direct evidence	Added to concept graph
Moderate	Supported by 1 source with clear evidence	Added to concept graph with note
Weak	Logically inferred but not directly stated in sources	Recorded in output only -- not added to graph until confirmed by a future source

Safeguards

When filing an answer:

Deduplication: Checks if an existing output already covers the same question or connection
Graph density cap: Max 8 related entries per concept -- new connections only replace weaker ones
Weak connections quarantined: Speculative links stay in outputs, not in the concept graph, until confirmed

Smart Agent

The vault includes a learning retrieval agent (.vault/agent.md) that gradually improves article routing based on your query history.

The agent does not activate on every query. It kicks in after a few queries and improves gradually:

Pre-routing (reading agent.md before the index) activates only after 5+ total queries in the vault.
Agent updates (writing back to agent.md) happen only after 3+ queries in the same session.
Most queries -- especially early ones -- never touch agent.md at all.

flowchart LR
    Q["/knowledge-vault:query"] --> A["agent.md\nsuggests articles"]
    A --> R["Claude reads\npriority articles"]
    R --> ANS["Answer"]
    ANS --> E["Evaluate:\nwhat was useful?"]
    E --> U["Update agent.md\nreinforce/expand/decay"]
    U -.->|next query| A

Note: The loop above activates after a few queries and improves gradually -- it does not run on every query from the start.

What it learns

Section	Max	What it tracks
Concept Clusters	8	Groups of concepts frequently queried together
Query Patterns	10	Maps question types to the specific articles that answer them
Source Signals	15	Which sources are most frequently useful and for what
Corrections	5	Retrieval mistakes to avoid repeating

How it saves tokens

Without the agent, every query scans the full index and reads 6-8 candidate articles. With the agent, Claude jumps directly to the 2-3 articles that matter.

Vault size	Agent cost	Savings per query	Net savings
3 sources	~225 tokens	~500 tokens	~275 tokens
8 sources	~600 tokens	~2,500 tokens	~1,900 tokens
15 sources	~1,000 tokens	~4,450 tokens	~3,450 tokens

Safeguards

Bounded: 6,000 character hard ceiling (~1,000 tokens max read cost)
Advisory only: Never overrides index.md -- only prioritizes which articles to read first
Cold start threshold: Not activated until 3+ queries or 5+ compiled sources
Exponential decay: Every 20 queries, hit counts halve -- recent patterns outweigh old ones
Self-cleaning: /knowledge-vault:lint detects and removes stale references
Reset: /knowledge-vault:agent-reset clears all learned patterns if needed

Lint Checks

/knowledge-vault:lint runs 8 health checks to keep your knowledge base consistent:

Check	What it catches	Severity
Contradictions	Conflicting claims across different sources	Critical
Stale articles	Concepts not updated after new sources added	Warning
Missing concepts	Referenced via `[[wikilink]]` but no article exists	Warning
Orphaned articles	Concept articles with no sources linked	Warning
Thin articles	Concept articles under 100 words	Suggestion
Duplicates	Overlapping concept coverage	Warning
Gap analysis	Missing topics that would strengthen the knowledge graph	Suggestion
Agent staleness	agent.md references deleted concepts or sources	Warning

Writing Quality

Articles are written to a strict standard -- factual, precise, no fluff.

Rules:

Tone: Flat, factual, Wikipedia-style. Let data imply significance.
Avoid: Peacock words ("groundbreaking", "revolutionary"), editorial voice ("interestingly"), rhetorical questions
Do: One claim per sentence. Short sentences. Replace adjectives with specifics (numbers, dates, methods).
Max 2 direct quotes per article -- choose the most impactful

Quality safeguards during compilation:

Anti-cramming: If a concept article develops 3+ distinct sub-topics, split into separate articles
Anti-thinning: Every article must have real substance -- stubs with 2 vague sentences are failures
Quality checkpoints: Every 5 compiled sources, audit the 3 most-updated articles for coherence
/knowledge-vault:cleanup: Dedicated command to audit and fix all articles -- restructure diary-style articles into thematic ones, split bloated articles, enrich stubs, fix broken links

Obsidian Frontend

Open .vault/ as an Obsidian vault. Zero configuration needed.

Graph View	Visualize concept connections via `[[wikilinks]]`
Backlinks	See every article referencing a concept
Search	Full-text search across all articles
Tags	Browse by YAML tags across all sources
Web Clipper	Clip from browser → auto-lands in `Clippings/` → `/knowledge-vault:process`

What's New in v2

Feature	v1 (skill)	v2 (plugin)
Architecture	Claude Code skill	Claude Code plugin with commands, skills, agents, hooks, and scripts
Invocation	Natural language (`vault compile`)	Slash commands (`/knowledge-vault:compile`)
Academic collection	Manual URL ingestion only	Batch search across 5 research servers via MCP
Source management	None	`/knowledge-vault:setup-sources` + `sources.json` config
Research agent	None	Dedicated vault-collector agent for parallel database search
Session hooks	None	Auto-detects `.vault/` on session start
Vault format	`.vault/` directory	Same -- fully backward compatible

Summary of changes

Plugin architecture: Commands are now registered slash commands, not natural-language triggers. Skills and agents are separate modules.
/knowledge-vault:collect: New command. Searches PubMed, arXiv, Scholar Gateway, Consensus, and Paper Search in parallel. Presents results for selective ingestion. Deduplicates across sources.
/knowledge-vault:setup-sources: New command. Detects installed MCP servers, shows available servers with ready-to-run install commands, writes sources.json.
Session hook: On session start, detects if the project has a .vault/ directory and loads vault context automatically.
sources.json: New config file tracking which research servers are configured per vault.

Migration

Upgrading from v1 (skill) to v2 (plugin):

Step 1 -- Remove the old skill:

rm -rf ~/.claude/skills/knowledge-vault

Step 2 -- Install the plugin (in Claude Code):

/plugin marketplace add psypeal/claude-knowledge-vault

/plugin install knowledge-vault@claude-knowledge-vault

/reload-plugins

Step 3 -- Verify (in any project with an existing vault):

> /knowledge-vault:status

That's it. Your existing .vault/ directories are fully compatible. No data migration needed.

Optional -- Configure academic sources for an existing vault:

> /knowledge-vault:setup-sources

Comparison

	Knowledge Vault v2	agno-agi/pal
Runtime	Claude Code plugin (your terminal)	FastAPI + Docker
Storage	Markdown + JSON	PostgreSQL + files
Setup	`git clone` one folder	Docker Compose + API keys
Scope	Per-project	Global personal agent
Dependencies	None (optional: `uv`, `npx` for MCP servers)	PostgreSQL, OpenAI API
Academic search	5 MCP servers, elastic config	Custom API integrations
Invocation	`/knowledge-vault:*` slash commands	Chat interface
Browsing	Obsidian	Custom web UI

Requirements

Claude Code v2.0+
python3 (for JSON updates in helper scripts)
uv (optional, for arXiv and Zotero MCP servers)
npx / Node.js (optional, for Paper Search MCP server)
Zotero 7+ (optional, for /knowledge-vault:ingest-zotero)
Obsidian (optional, for browsing)

Credits

Andrej Karpathy -- LLM knowledge base compilation concept
agno-agi/pal -- manifest tracking, YAML schemas, linting architecture
farzaa/wiki -- wiki-as-knowledge-base pattern
blazickjp/arxiv-mcp-server -- arXiv MCP server
54yyyu/zotero-mcp -- Zotero MCP server powering /knowledge-vault:ingest-zotero
Galaxy-Dawn/claude-scholar -- inspiration for the Zotero → knowledge-base workflow

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.claude-plugin		.claude-plugin
eval		eval
plugins/knowledge-vault		plugins/knowledge-vault
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Knowledge Vault

What It Does

Install

New install

Update

Uninstall

Migrating from v1 (skill)

Quick Start

Commands

Academic Collection

Supported servers

How it works

Add your own MCP servers

Collect options

Zotero Integration

Setup

How it works

Usage

What gets preserved

Project Structure

Personalized Preferences

3-Tier Query Routing

Compounding Knowledge

How it works

Connection strength

Safeguards

Smart Agent

What it learns

How it saves tokens

Safeguards

Lint Checks

Writing Quality

Obsidian Frontend

What's New in v2

Summary of changes

Migration

Comparison

Requirements

Credits

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages