A local, LLM-powered knowledge base that ingests from Zotero, academic databases, and the web.
Pull papers from your Zotero library. Batch-search PubMed, arXiv, Scholar, Consensus. Compile a cross-referenced wiki. Query your knowledge. Browse in Obsidian.
Built on ideas from Andrej Karpathy's LLM knowledge base approach, the agno-agi/pal architecture, and the farzaa/wiki pattern.
Knowledge Vault is a Claude Code plugin that turns any project directory into a structured knowledge base. It batch-ingests from your Zotero library and from academic databases (PubMed, arXiv, Scholar Gateway, Consensus, Paper Search) via MCP servers — then compiles the raw sources into a cross-referenced wiki you can query.
flowchart LR
subgraph collect ["Academic Collection"]
P["PubMed"]
X["arXiv"]
S["Scholar Gateway"]
CO["Consensus"]
PS["Paper Search\n(14 databases)"]
end
Z["Zotero\nLibrary"]
collect -->|"/knowledge-vault:collect"| R["raw/\n.manifest.json\nsources.json"]
Z -->|"/knowledge-vault:ingest-zotero"| R
A["URLs, files,\nnotes, clips"] -->|"/knowledge-vault:ingest"| R
CL["Obsidian\nWeb Clipper"] -->|auto| CLP["Clippings/"]
CLP -->|"/knowledge-vault:process"| R
R -->|"/knowledge-vault:compile"| W["wiki/\nsummaries/\nconcepts/\nindex.md"]
W -->|"/knowledge-vault:query"| ANS["Grounded\nAnswers"]
W -->|"/knowledge-vault:lint"| H["Health\nReport"]
W -->|browse| O["Obsidian\nGraph View"]
Claude maintains all wiki content. You browse and query -- never edit directly.
Step 1 — Add the marketplace (one time only):
/plugin marketplace add psypeal/claude-knowledge-vaultStep 2 — Install the plugin:
/plugin install knowledge-vault@claude-knowledge-vaultStep 3 — Reload:
/reload-pluginsNo config, no dependencies, no API keys.
When a new version is released, refresh the marketplace to pull the latest:
/plugin marketplace update claude-knowledge-vaultThen reload so the new commands, scripts, and fixes take effect:
/reload-pluginsIf auto-update is enabled for this marketplace, the plugin updates automatically during the marketplace refresh. Otherwise, toggle it via /plugin → Marketplaces → select claude-knowledge-vault → Enable auto-update.
/plugin uninstall knowledge-vault@claude-knowledge-vaultTo uninstall from a specific scope, use the /plugin Installed tab — select the plugin and choose Uninstall.
Existing vaults are untouched — the .vault/ directory format is unchanged.
# 1. Remove the old skill
rm -rf ~/.claude/skills/knowledge-vaultThen in Claude Code:
/plugin marketplace add psypeal/claude-knowledge-vault/plugin install knowledge-vault@claude-knowledge-vault/reload-plugins
# 3. Done — your existing .vault/ directories work as-isSee Migration for full details.
> /knowledge-vault:init
Vault initialized at .vault/
Let me configure your vault preferences.
What domain is this vault for?
> Neuroimaging and neurodegeneration research
Preferences saved to .vault/preferences.md
Tip: Run /knowledge-vault:setup-sources to configure academic databases.
> /knowledge-vault:setup-sources
Detected:
PubMed (Claude.ai built-in) active
Scholar Gateway (Claude.ai built-in) active
Available to add:
Consensus claude mcp add --transport http consensus https://mcp.consensus.app/mcp
arXiv claude mcp add arxiv-mcp-server -- uvx arxiv-mcp-server ...
Paper Search claude mcp add paper-search -- npx -y paper-search-mcp-nodejs
Which servers would you like to add?
> Consensus and arXiv
Added 2 servers. Sources saved to .vault/sources.json
> /knowledge-vault:collect tau PET imaging neurodegeneration --since 2023
Searching PubMed, Scholar Gateway, Consensus, arXiv...
| # | Title | Source | Date | Type |
|---|-----------------------------------------------|----------|------|--------|
| 1 | Tau PET imaging in early Alzheimer's disease | PubMed | 2024 | paper |
| 2 | Longitudinal tau accumulation in subcortical... | Consensus| 2023 | paper |
| 3 | Second-generation tau tracers: a review | arXiv | 2024 | review |
Which to ingest? (all / 1,3 / none)
> all
Ingested 3 sources. 3 pending compilation.
> /knowledge-vault:compile
Compiled 3 sources. Extracted 7 concepts:
tau-pet-imaging, neurodegeneration, alzheimers-disease, tau-tracers,
subcortical-tau, longitudinal-imaging, amyloid-tau-interaction
> /knowledge-vault:query What is the current evidence for second-generation tau tracers?
Based on the vault: Second-generation tau PET tracers (e.g., [18F]MK-6240,
[18F]PI-2620) show improved off-target binding profiles compared to
first-generation [18F]AV-1451. Three vault sources report higher specificity
for neurofibrillary tau in Braak stages III-IV...
Sources: [[tau-tracers]], [[tau-pet-imaging]]
| Command | Description |
|---|---|
/knowledge-vault:init |
Initialize a .vault/ knowledge base in the current project |
/knowledge-vault:ingest <source> |
Add a raw source -- URL, pasted text, or file path |
/knowledge-vault:ingest-zotero <collection> |
Batch ingest papers from a Zotero collection (metadata, fulltext, annotations) |
/knowledge-vault:collect <query> |
Batch search academic databases and selectively ingest results |
/knowledge-vault:setup-sources |
Configure research MCP servers for academic collection |
/knowledge-vault:compile |
Compile pending sources into wiki summaries and concept articles |
/knowledge-vault:lint |
Run 8 health checks on the wiki |
/knowledge-vault:cleanup |
Audit and actively fix article quality issues |
/knowledge-vault:query <question> |
Ask a question grounded in your vault's knowledge |
/knowledge-vault:process |
Batch: ingest all web clips + compile everything |
/knowledge-vault:status |
Print a quick status summary |
/knowledge-vault:agent-reset |
Clear learned retrieval patterns and start fresh |
The headline feature of v2. /knowledge-vault:collect searches multiple academic databases in parallel and lets you cherry-pick which results to ingest.
| Server | Type | Setup | Databases |
|---|---|---|---|
| PubMed | Claude.ai built-in | No setup needed | PubMed, PMC |
| Scholar Gateway | Claude.ai built-in | No setup needed | Broad academic literature |
| Consensus | HTTP MCP | claude mcp add --transport http consensus https://mcp.consensus.app/mcp |
Research consensus engine |
| arXiv | stdio MCP | claude mcp add arxiv-mcp-server -- uvx arxiv-mcp-server --storage-path .vault/raw/arxiv-papers |
arXiv preprints |
| Paper Search | stdio MCP | claude mcp add paper-search -- npx -y paper-search-mcp-nodejs |
14 databases: arXiv, PubMed, Semantic Scholar, bioRxiv, medRxiv, Crossref, CORE, OpenAlex, DOAJ, Europe PMC, Internet Archive Scholar, Fatcat, BASE, DBLP |
| Zotero | stdio MCP | uv tool install zotero-mcp-server && zotero-mcp setup |
Your local Zotero library — collections, metadata, PDF fulltext, annotations |
/knowledge-vault:setup-sourcesdetects what you already have configured and shows what else is available. You approve each server individually./knowledge-vault:collect <query>searches all enabled servers in parallel, deduplicates results, and presents a numbered table.- You pick which results to ingest --
all, specific numbers (1,3,5), or filters (only 2024+). - Selected papers are ingested to
raw/with full metadata and available text.
The system is elastic and user-controlled. No server is added without your approval. No paper is ingested without your selection.
The 5 servers above are pre-configured suggestions, but you can add any MCP server as a research source. Just two steps:
-
Add the server using
claude mcp add:claude mcp add my-server -- npx -y my-mcp-package # or for HTTP servers: claude mcp add --transport http my-server https://example.com/mcp -
Register it in your vault by editing
.vault/sources.json:{ "id": "my-server", "name": "My Custom Server", "type": "stdio", "enabled": true, "tools": ["mcp__my-server__search"] }
Once registered, /knowledge-vault:collect will include your custom server in batch searches alongside the built-in ones.
/knowledge-vault:collect transformers attention mechanisms # basic search
/knowledge-vault:collect tau PET imaging --since 2023 # papers from 2023 onward
/knowledge-vault:collect CRISPR delivery --count 5 # max 5 results per source
/knowledge-vault:collect meta-analysis sleep cognition --type review # filter by type
/knowledge-vault:ingest-zotero <collection> batch-imports papers from your local Zotero library — metadata, PDF fulltext, and your highlighted annotations — and drops them into the vault's raw/ directory ready for compilation.
Install 54yyyu/zotero-mcp once:
uv tool install zotero-mcp-server && zotero-mcp setupThen make sure Zotero is running with local API enabled (Zotero 7+: Settings → Advanced → Allow other applications on this computer to communicate with Zotero).
Zotero collection ──▶ list items ──▶ you pick which to ingest
│
▼
For each paper:
- metadata (title, authors, year, DOI, BetterBibTeX citekey, Zotero key)
- PDF fulltext (if attached)
- your highlighted annotations (if any)
│
▼
Structured extraction ──▶ raw/<slug>.md ──▶ /knowledge-vault:compile
(~800-1200 words, not the full PDF — the full paper stays in Zotero)
> /knowledge-vault:ingest-zotero hippocampus-review-2024
Found collection: "Hippocampus Review 2024" (12 items)
| # | Title | Authors | Year | Type |
|---|--------------------------------------------------|-------------------|------|----------|
| 1 | Place cell remapping in CA1 after sleep | Tanaka et al. | 2024 | paper |
| 2 | Entorhinal grid cells and path integration | Rowland & Moser | 2023 | paper |
| 3 | Hippocampal theta rhythm: a 50-year review | Buzsáki | 2024 | review |
...
Ingest which? (e.g., 1,3,5 or all)
> all
Ingested 12 papers. Run /knowledge-vault:compile now?
Each raw file gains extra Zotero-specific frontmatter fields so you can trace back to the source:
---
title: "Place cell remapping in CA1 after sleep"
source: "https://doi.org/10.1038/s41593-024-xxxxx"
type: paper
zotero_key: "ABCD1234"
citekey: "tanaka2024place"
doi: "10.1038/s41593-024-xxxxx"
year: 2024
authors: ["Tanaka K", "Moser EI", "..."]
compiled: false
---Re-running the command is safe — existing slugs are skipped, so you can incrementally pull new items as you add them to Zotero.
After /knowledge-vault:init and /knowledge-vault:setup-sources:
your-project/
.vault/
├── preferences.md User preferences (interview-generated)
├── agent.md Learned retrieval intelligence (auto-maintained)
├── sources.json Configured research MCP servers
├── Clippings/ Obsidian Web Clipper default folder
├── raw/ Ingested sources with YAML frontmatter
│ ├── .manifest.json Source registry
│ └── arxiv-papers/ arXiv PDFs (if arXiv server configured)
├── wiki/
│ ├── index.md Master routing index
│ ├── _backlinks.json Reverse link index
│ ├── concepts/ One article per topic
│ ├── summaries/ One summary per source
│ ├── outputs/ Query results and lint reports
│ └── .state.json Compilation and lint state
└── templates/ Frontmatter skeletons
During /knowledge-vault:init, Claude interviews you about your vault's domain and priorities:
> /knowledge-vault:init
Vault initialized at .vault/
Let me configure your vault preferences.
What domain is this vault for?
> Biomedical research -- neuroimaging and neurodegeneration
What sources will you mainly use?
> Papers from PubMed, review articles, and meeting notes
Any priority rules for sources?
> Peer-reviewed > preprints > blog posts. Prioritize longitudinal studies.
How granular should concepts be?
> Balanced -- not too broad, not too narrow
Any special compilation instructions?
> Always extract methodology and sample size. Note statistical methods used.
Preferences saved to .vault/preferences.md
This creates .vault/preferences.md -- Claude reads it at the start of every vault operation. It shapes how sources are summarized, which concepts are extracted, and how queries are answered.
You can edit preferences.md manually anytime. Claude always picks up the latest version.
Queries stay efficient at any vault size. Claude never loads everything -- it reads the index, picks what's relevant, and drills down only when needed.
Tier 1 ───── wiki/index.md Always read first (one-line per entry)
│
Tier 2 ───── summaries/ + concepts/ Read relevant matches (200-500 words each)
│
Tier 3 ───── raw/ Full source text (only when depth needed)
Queries answer questions from the vault. When an answer is particularly valuable, you can choose to save it back into the vault, enriching future queries.
- Query: Claude reads
wiki/index.md, picks 2-4 relevant articles, and answers your question with[[wikilinks]]to sources. - File it: If the answer is worth keeping, say "file it". Claude saves it to
wiki/outputs/and updates the index. Filed answers become available to future queries. - Leave it: Most queries just return an answer and nothing is saved. Simple lookups pass through without adding noise.
Filing is always user-initiated -- Claude does not automatically classify or save answers.
When you file an answer that connects multiple concepts, the connection gets a strength rating:
| Strength | Criteria | Graph impact |
|---|---|---|
| Strong | Supported by 2+ independent sources with direct evidence | Added to concept graph |
| Moderate | Supported by 1 source with clear evidence | Added to concept graph with note |
| Weak | Logically inferred but not directly stated in sources | Recorded in output only -- not added to graph until confirmed by a future source |
When filing an answer:
- Deduplication: Checks if an existing output already covers the same question or connection
- Graph density cap: Max 8
relatedentries per concept -- new connections only replace weaker ones - Weak connections quarantined: Speculative links stay in outputs, not in the concept graph, until confirmed
The vault includes a learning retrieval agent (.vault/agent.md) that gradually improves article routing based on your query history.
The agent does not activate on every query. It kicks in after a few queries and improves gradually:
- Pre-routing (reading agent.md before the index) activates only after 5+ total queries in the vault.
- Agent updates (writing back to agent.md) happen only after 3+ queries in the same session.
- Most queries -- especially early ones -- never touch agent.md at all.
flowchart LR
Q["/knowledge-vault:query"] --> A["agent.md\nsuggests articles"]
A --> R["Claude reads\npriority articles"]
R --> ANS["Answer"]
ANS --> E["Evaluate:\nwhat was useful?"]
E --> U["Update agent.md\nreinforce/expand/decay"]
U -.->|next query| A
Note: The loop above activates after a few queries and improves gradually -- it does not run on every query from the start.
| Section | Max | What it tracks |
|---|---|---|
| Concept Clusters | 8 | Groups of concepts frequently queried together |
| Query Patterns | 10 | Maps question types to the specific articles that answer them |
| Source Signals | 15 | Which sources are most frequently useful and for what |
| Corrections | 5 | Retrieval mistakes to avoid repeating |
Without the agent, every query scans the full index and reads 6-8 candidate articles. With the agent, Claude jumps directly to the 2-3 articles that matter.
| Vault size | Agent cost | Savings per query | Net savings |
|---|---|---|---|
| 3 sources | ~225 tokens | ~500 tokens | ~275 tokens |
| 8 sources | ~600 tokens | ~2,500 tokens | ~1,900 tokens |
| 15 sources | ~1,000 tokens | ~4,450 tokens | ~3,450 tokens |
- Bounded: 6,000 character hard ceiling (~1,000 tokens max read cost)
- Advisory only: Never overrides
index.md-- only prioritizes which articles to read first - Cold start threshold: Not activated until 3+ queries or 5+ compiled sources
- Exponential decay: Every 20 queries, hit counts halve -- recent patterns outweigh old ones
- Self-cleaning:
/knowledge-vault:lintdetects and removes stale references - Reset:
/knowledge-vault:agent-resetclears all learned patterns if needed
/knowledge-vault:lint runs 8 health checks to keep your knowledge base consistent:
| Check | What it catches | Severity |
|---|---|---|
| Contradictions | Conflicting claims across different sources | Critical |
| Stale articles | Concepts not updated after new sources added | Warning |
| Missing concepts | Referenced via [[wikilink]] but no article exists |
Warning |
| Orphaned articles | Concept articles with no sources linked | Warning |
| Thin articles | Concept articles under 100 words | Suggestion |
| Duplicates | Overlapping concept coverage | Warning |
| Gap analysis | Missing topics that would strengthen the knowledge graph | Suggestion |
| Agent staleness | agent.md references deleted concepts or sources | Warning |
Articles are written to a strict standard -- factual, precise, no fluff.
Rules:
- Tone: Flat, factual, Wikipedia-style. Let data imply significance.
- Avoid: Peacock words ("groundbreaking", "revolutionary"), editorial voice ("interestingly"), rhetorical questions
- Do: One claim per sentence. Short sentences. Replace adjectives with specifics (numbers, dates, methods).
- Max 2 direct quotes per article -- choose the most impactful
Quality safeguards during compilation:
- Anti-cramming: If a concept article develops 3+ distinct sub-topics, split into separate articles
- Anti-thinning: Every article must have real substance -- stubs with 2 vague sentences are failures
- Quality checkpoints: Every 5 compiled sources, audit the 3 most-updated articles for coherence
/knowledge-vault:cleanup: Dedicated command to audit and fix all articles -- restructure diary-style articles into thematic ones, split bloated articles, enrich stubs, fix broken links
Open .vault/ as an Obsidian vault. Zero configuration needed.
| Graph View | Visualize concept connections via [[wikilinks]] |
| Backlinks | See every article referencing a concept |
| Search | Full-text search across all articles |
| Tags | Browse by YAML tags across all sources |
| Web Clipper | Clip from browser → auto-lands in Clippings/ → /knowledge-vault:process |
| Feature | v1 (skill) | v2 (plugin) |
|---|---|---|
| Architecture | Claude Code skill | Claude Code plugin with commands, skills, agents, hooks, and scripts |
| Invocation | Natural language (vault compile) |
Slash commands (/knowledge-vault:compile) |
| Academic collection | Manual URL ingestion only | Batch search across 5 research servers via MCP |
| Source management | None | /knowledge-vault:setup-sources + sources.json config |
| Research agent | None | Dedicated vault-collector agent for parallel database search |
| Session hooks | None | Auto-detects .vault/ on session start |
| Vault format | .vault/ directory |
Same -- fully backward compatible |
- Plugin architecture: Commands are now registered slash commands, not natural-language triggers. Skills and agents are separate modules.
/knowledge-vault:collect: New command. Searches PubMed, arXiv, Scholar Gateway, Consensus, and Paper Search in parallel. Presents results for selective ingestion. Deduplicates across sources./knowledge-vault:setup-sources: New command. Detects installed MCP servers, shows available servers with ready-to-run install commands, writessources.json.- Session hook: On session start, detects if the project has a
.vault/directory and loads vault context automatically. sources.json: New config file tracking which research servers are configured per vault.
Upgrading from v1 (skill) to v2 (plugin):
Step 1 -- Remove the old skill:
rm -rf ~/.claude/skills/knowledge-vaultStep 2 -- Install the plugin (in Claude Code):
/plugin marketplace add psypeal/claude-knowledge-vault/plugin install knowledge-vault@claude-knowledge-vault/reload-pluginsStep 3 -- Verify (in any project with an existing vault):
> /knowledge-vault:status
That's it. Your existing .vault/ directories are fully compatible. No data migration needed.
Optional -- Configure academic sources for an existing vault:
> /knowledge-vault:setup-sources
| Knowledge Vault v2 | agno-agi/pal | |
|---|---|---|
| Runtime | Claude Code plugin (your terminal) | FastAPI + Docker |
| Storage | Markdown + JSON | PostgreSQL + files |
| Setup | git clone one folder |
Docker Compose + API keys |
| Scope | Per-project | Global personal agent |
| Dependencies | None (optional: uv, npx for MCP servers) |
PostgreSQL, OpenAI API |
| Academic search | 5 MCP servers, elastic config | Custom API integrations |
| Invocation | /knowledge-vault:* slash commands |
Chat interface |
| Browsing | Obsidian | Custom web UI |
- Claude Code v2.0+
python3(for JSON updates in helper scripts)uv(optional, for arXiv and Zotero MCP servers)npx/ Node.js (optional, for Paper Search MCP server)- Zotero 7+ (optional, for
/knowledge-vault:ingest-zotero) - Obsidian (optional, for browsing)
- Andrej Karpathy -- LLM knowledge base compilation concept
- agno-agi/pal -- manifest tracking, YAML schemas, linting architecture
- farzaa/wiki -- wiki-as-knowledge-base pattern
- blazickjp/arxiv-mcp-server -- arXiv MCP server
- 54yyyu/zotero-mcp -- Zotero MCP server powering
/knowledge-vault:ingest-zotero - Galaxy-Dawn/claude-scholar -- inspiration for the Zotero → knowledge-base workflow