Skip to content

psypeal/claude-knowledge-vault

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge Vault

A local, LLM-powered knowledge base that ingests from Zotero, academic databases, and the web.
Pull papers from your Zotero library. Batch-search PubMed, arXiv, Scholar, Consensus. Compile a cross-referenced wiki. Query your knowledge. Browse in Obsidian.

License Claude Code Plugin Obsidian Compatible MCP Compatible


Built on ideas from Andrej Karpathy's LLM knowledge base approach, the agno-agi/pal architecture, and the farzaa/wiki pattern.


What It Does

Knowledge Vault is a Claude Code plugin that turns any project directory into a structured knowledge base. It batch-ingests from your Zotero library and from academic databases (PubMed, arXiv, Scholar Gateway, Consensus, Paper Search) via MCP servers — then compiles the raw sources into a cross-referenced wiki you can query.

flowchart LR
    subgraph collect ["Academic Collection"]
        P["PubMed"]
        X["arXiv"]
        S["Scholar Gateway"]
        CO["Consensus"]
        PS["Paper Search\n(14 databases)"]
    end
    Z["Zotero\nLibrary"]
    collect -->|"/knowledge-vault:collect"| R["raw/\n.manifest.json\nsources.json"]
    Z -->|"/knowledge-vault:ingest-zotero"| R
    A["URLs, files,\nnotes, clips"] -->|"/knowledge-vault:ingest"| R
    CL["Obsidian\nWeb Clipper"] -->|auto| CLP["Clippings/"]
    CLP -->|"/knowledge-vault:process"| R
    R -->|"/knowledge-vault:compile"| W["wiki/\nsummaries/\nconcepts/\nindex.md"]
    W -->|"/knowledge-vault:query"| ANS["Grounded\nAnswers"]
    W -->|"/knowledge-vault:lint"| H["Health\nReport"]
    W -->|browse| O["Obsidian\nGraph View"]
Loading

Claude maintains all wiki content. You browse and query -- never edit directly.


Install

New install

Step 1 — Add the marketplace (one time only):

/plugin marketplace add psypeal/claude-knowledge-vault

Step 2 — Install the plugin:

/plugin install knowledge-vault@claude-knowledge-vault

Step 3 — Reload:

/reload-plugins

No config, no dependencies, no API keys.

Update

When a new version is released, refresh the marketplace to pull the latest:

/plugin marketplace update claude-knowledge-vault

Then reload so the new commands, scripts, and fixes take effect:

/reload-plugins

If auto-update is enabled for this marketplace, the plugin updates automatically during the marketplace refresh. Otherwise, toggle it via /pluginMarketplaces → select claude-knowledge-vaultEnable auto-update.

Uninstall

/plugin uninstall knowledge-vault@claude-knowledge-vault

To uninstall from a specific scope, use the /plugin Installed tab — select the plugin and choose Uninstall.

Migrating from v1 (skill)

Existing vaults are untouched — the .vault/ directory format is unchanged.

# 1. Remove the old skill
rm -rf ~/.claude/skills/knowledge-vault

Then in Claude Code:

/plugin marketplace add psypeal/claude-knowledge-vault
/plugin install knowledge-vault@claude-knowledge-vault
/reload-plugins

# 3. Done — your existing .vault/ directories work as-is

See Migration for full details.


Quick Start

> /knowledge-vault:init
  Vault initialized at .vault/

  Let me configure your vault preferences.

  What domain is this vault for?
> Neuroimaging and neurodegeneration research

  Preferences saved to .vault/preferences.md

  Tip: Run /knowledge-vault:setup-sources to configure academic databases.

> /knowledge-vault:setup-sources
  Detected:
    PubMed (Claude.ai built-in)         active
    Scholar Gateway (Claude.ai built-in) active

  Available to add:
    Consensus       claude mcp add --transport http consensus https://mcp.consensus.app/mcp
    arXiv           claude mcp add arxiv-mcp-server -- uvx arxiv-mcp-server ...
    Paper Search    claude mcp add paper-search -- npx -y paper-search-mcp-nodejs

  Which servers would you like to add?
> Consensus and arXiv

  Added 2 servers. Sources saved to .vault/sources.json

> /knowledge-vault:collect tau PET imaging neurodegeneration --since 2023
  Searching PubMed, Scholar Gateway, Consensus, arXiv...

  | # | Title                                         | Source   | Date | Type   |
  |---|-----------------------------------------------|----------|------|--------|
  | 1 | Tau PET imaging in early Alzheimer's disease   | PubMed   | 2024 | paper  |
  | 2 | Longitudinal tau accumulation in subcortical... | Consensus| 2023 | paper  |
  | 3 | Second-generation tau tracers: a review         | arXiv    | 2024 | review |

  Which to ingest? (all / 1,3 / none)
> all

  Ingested 3 sources. 3 pending compilation.

> /knowledge-vault:compile
  Compiled 3 sources. Extracted 7 concepts:
  tau-pet-imaging, neurodegeneration, alzheimers-disease, tau-tracers,
  subcortical-tau, longitudinal-imaging, amyloid-tau-interaction

> /knowledge-vault:query What is the current evidence for second-generation tau tracers?
  Based on the vault: Second-generation tau PET tracers (e.g., [18F]MK-6240,
  [18F]PI-2620) show improved off-target binding profiles compared to
  first-generation [18F]AV-1451. Three vault sources report higher specificity
  for neurofibrillary tau in Braak stages III-IV...
  Sources: [[tau-tracers]], [[tau-pet-imaging]]

Commands

Command Description
/knowledge-vault:init Initialize a .vault/ knowledge base in the current project
/knowledge-vault:ingest <source> Add a raw source -- URL, pasted text, or file path
/knowledge-vault:ingest-zotero <collection> Batch ingest papers from a Zotero collection (metadata, fulltext, annotations)
/knowledge-vault:collect <query> Batch search academic databases and selectively ingest results
/knowledge-vault:setup-sources Configure research MCP servers for academic collection
/knowledge-vault:compile Compile pending sources into wiki summaries and concept articles
/knowledge-vault:lint Run 8 health checks on the wiki
/knowledge-vault:cleanup Audit and actively fix article quality issues
/knowledge-vault:query <question> Ask a question grounded in your vault's knowledge
/knowledge-vault:process Batch: ingest all web clips + compile everything
/knowledge-vault:status Print a quick status summary
/knowledge-vault:agent-reset Clear learned retrieval patterns and start fresh

Academic Collection

The headline feature of v2. /knowledge-vault:collect searches multiple academic databases in parallel and lets you cherry-pick which results to ingest.

Supported servers

Server Type Setup Databases
PubMed Claude.ai built-in No setup needed PubMed, PMC
Scholar Gateway Claude.ai built-in No setup needed Broad academic literature
Consensus HTTP MCP claude mcp add --transport http consensus https://mcp.consensus.app/mcp Research consensus engine
arXiv stdio MCP claude mcp add arxiv-mcp-server -- uvx arxiv-mcp-server --storage-path .vault/raw/arxiv-papers arXiv preprints
Paper Search stdio MCP claude mcp add paper-search -- npx -y paper-search-mcp-nodejs 14 databases: arXiv, PubMed, Semantic Scholar, bioRxiv, medRxiv, Crossref, CORE, OpenAlex, DOAJ, Europe PMC, Internet Archive Scholar, Fatcat, BASE, DBLP
Zotero stdio MCP uv tool install zotero-mcp-server && zotero-mcp setup Your local Zotero library — collections, metadata, PDF fulltext, annotations

How it works

  1. /knowledge-vault:setup-sources detects what you already have configured and shows what else is available. You approve each server individually.
  2. /knowledge-vault:collect <query> searches all enabled servers in parallel, deduplicates results, and presents a numbered table.
  3. You pick which results to ingest -- all, specific numbers (1,3,5), or filters (only 2024+).
  4. Selected papers are ingested to raw/ with full metadata and available text.

The system is elastic and user-controlled. No server is added without your approval. No paper is ingested without your selection.

Add your own MCP servers

The 5 servers above are pre-configured suggestions, but you can add any MCP server as a research source. Just two steps:

  1. Add the server using claude mcp add:

    claude mcp add my-server -- npx -y my-mcp-package
    # or for HTTP servers:
    claude mcp add --transport http my-server https://example.com/mcp
  2. Register it in your vault by editing .vault/sources.json:

    {
      "id": "my-server",
      "name": "My Custom Server",
      "type": "stdio",
      "enabled": true,
      "tools": ["mcp__my-server__search"]
    }

Once registered, /knowledge-vault:collect will include your custom server in batch searches alongside the built-in ones.

Collect options

/knowledge-vault:collect transformers attention mechanisms          # basic search
/knowledge-vault:collect tau PET imaging --since 2023              # papers from 2023 onward
/knowledge-vault:collect CRISPR delivery --count 5                 # max 5 results per source
/knowledge-vault:collect meta-analysis sleep cognition --type review  # filter by type

Zotero Integration

/knowledge-vault:ingest-zotero <collection> batch-imports papers from your local Zotero library — metadata, PDF fulltext, and your highlighted annotations — and drops them into the vault's raw/ directory ready for compilation.

Setup

Install 54yyyu/zotero-mcp once:

uv tool install zotero-mcp-server && zotero-mcp setup

Then make sure Zotero is running with local API enabled (Zotero 7+: Settings → Advanced → Allow other applications on this computer to communicate with Zotero).

How it works

Zotero collection  ──▶  list items  ──▶  you pick which to ingest
       │
       ▼
  For each paper:
    - metadata (title, authors, year, DOI, BetterBibTeX citekey, Zotero key)
    - PDF fulltext (if attached)
    - your highlighted annotations (if any)
       │
       ▼
  Structured extraction  ──▶  raw/<slug>.md  ──▶  /knowledge-vault:compile
  (~800-1200 words, not the full PDF — the full paper stays in Zotero)

Usage

> /knowledge-vault:ingest-zotero hippocampus-review-2024

  Found collection: "Hippocampus Review 2024" (12 items)

  | # | Title                                            | Authors           | Year | Type     |
  |---|--------------------------------------------------|-------------------|------|----------|
  | 1 | Place cell remapping in CA1 after sleep          | Tanaka et al.     | 2024 | paper    |
  | 2 | Entorhinal grid cells and path integration       | Rowland & Moser   | 2023 | paper    |
  | 3 | Hippocampal theta rhythm: a 50-year review       | Buzsáki           | 2024 | review   |
  ...

  Ingest which? (e.g., 1,3,5 or all)
> all

  Ingested 12 papers. Run /knowledge-vault:compile now?

What gets preserved

Each raw file gains extra Zotero-specific frontmatter fields so you can trace back to the source:

---
title: "Place cell remapping in CA1 after sleep"
source: "https://doi.org/10.1038/s41593-024-xxxxx"
type: paper
zotero_key: "ABCD1234"
citekey: "tanaka2024place"
doi: "10.1038/s41593-024-xxxxx"
year: 2024
authors: ["Tanaka K", "Moser EI", "..."]
compiled: false
---

Re-running the command is safe — existing slugs are skipped, so you can incrementally pull new items as you add them to Zotero.


Project Structure

After /knowledge-vault:init and /knowledge-vault:setup-sources:

your-project/
  .vault/
  ├── preferences.md       User preferences (interview-generated)
  ├── agent.md             Learned retrieval intelligence (auto-maintained)
  ├── sources.json         Configured research MCP servers
  ├── Clippings/           Obsidian Web Clipper default folder
  ├── raw/                 Ingested sources with YAML frontmatter
  │   ├── .manifest.json   Source registry
  │   └── arxiv-papers/    arXiv PDFs (if arXiv server configured)
  ├── wiki/
  │   ├── index.md         Master routing index
  │   ├── _backlinks.json  Reverse link index
  │   ├── concepts/        One article per topic
  │   ├── summaries/       One summary per source
  │   ├── outputs/         Query results and lint reports
  │   └── .state.json      Compilation and lint state
  └── templates/           Frontmatter skeletons

Personalized Preferences

During /knowledge-vault:init, Claude interviews you about your vault's domain and priorities:

> /knowledge-vault:init
  Vault initialized at .vault/

  Let me configure your vault preferences.

  What domain is this vault for?
> Biomedical research -- neuroimaging and neurodegeneration

  What sources will you mainly use?
> Papers from PubMed, review articles, and meeting notes

  Any priority rules for sources?
> Peer-reviewed > preprints > blog posts. Prioritize longitudinal studies.

  How granular should concepts be?
> Balanced -- not too broad, not too narrow

  Any special compilation instructions?
> Always extract methodology and sample size. Note statistical methods used.

  Preferences saved to .vault/preferences.md

This creates .vault/preferences.md -- Claude reads it at the start of every vault operation. It shapes how sources are summarized, which concepts are extracted, and how queries are answered.

You can edit preferences.md manually anytime. Claude always picks up the latest version.


3-Tier Query Routing

Queries stay efficient at any vault size. Claude never loads everything -- it reads the index, picks what's relevant, and drills down only when needed.

Tier 1  ─────  wiki/index.md           Always read first (one-line per entry)
                    │
Tier 2  ─────  summaries/ + concepts/  Read relevant matches (200-500 words each)
                    │
Tier 3  ─────  raw/                    Full source text (only when depth needed)

Compounding Knowledge

Queries answer questions from the vault. When an answer is particularly valuable, you can choose to save it back into the vault, enriching future queries.

How it works

  1. Query: Claude reads wiki/index.md, picks 2-4 relevant articles, and answers your question with [[wikilinks]] to sources.
  2. File it: If the answer is worth keeping, say "file it". Claude saves it to wiki/outputs/ and updates the index. Filed answers become available to future queries.
  3. Leave it: Most queries just return an answer and nothing is saved. Simple lookups pass through without adding noise.

Filing is always user-initiated -- Claude does not automatically classify or save answers.

Connection strength

When you file an answer that connects multiple concepts, the connection gets a strength rating:

Strength Criteria Graph impact
Strong Supported by 2+ independent sources with direct evidence Added to concept graph
Moderate Supported by 1 source with clear evidence Added to concept graph with note
Weak Logically inferred but not directly stated in sources Recorded in output only -- not added to graph until confirmed by a future source

Safeguards

When filing an answer:

  • Deduplication: Checks if an existing output already covers the same question or connection
  • Graph density cap: Max 8 related entries per concept -- new connections only replace weaker ones
  • Weak connections quarantined: Speculative links stay in outputs, not in the concept graph, until confirmed

Smart Agent

The vault includes a learning retrieval agent (.vault/agent.md) that gradually improves article routing based on your query history.

The agent does not activate on every query. It kicks in after a few queries and improves gradually:

  • Pre-routing (reading agent.md before the index) activates only after 5+ total queries in the vault.
  • Agent updates (writing back to agent.md) happen only after 3+ queries in the same session.
  • Most queries -- especially early ones -- never touch agent.md at all.
flowchart LR
    Q["/knowledge-vault:query"] --> A["agent.md\nsuggests articles"]
    A --> R["Claude reads\npriority articles"]
    R --> ANS["Answer"]
    ANS --> E["Evaluate:\nwhat was useful?"]
    E --> U["Update agent.md\nreinforce/expand/decay"]
    U -.->|next query| A
Loading

Note: The loop above activates after a few queries and improves gradually -- it does not run on every query from the start.

What it learns

Section Max What it tracks
Concept Clusters 8 Groups of concepts frequently queried together
Query Patterns 10 Maps question types to the specific articles that answer them
Source Signals 15 Which sources are most frequently useful and for what
Corrections 5 Retrieval mistakes to avoid repeating

How it saves tokens

Without the agent, every query scans the full index and reads 6-8 candidate articles. With the agent, Claude jumps directly to the 2-3 articles that matter.

Vault size Agent cost Savings per query Net savings
3 sources ~225 tokens ~500 tokens ~275 tokens
8 sources ~600 tokens ~2,500 tokens ~1,900 tokens
15 sources ~1,000 tokens ~4,450 tokens ~3,450 tokens

Safeguards

  • Bounded: 6,000 character hard ceiling (~1,000 tokens max read cost)
  • Advisory only: Never overrides index.md -- only prioritizes which articles to read first
  • Cold start threshold: Not activated until 3+ queries or 5+ compiled sources
  • Exponential decay: Every 20 queries, hit counts halve -- recent patterns outweigh old ones
  • Self-cleaning: /knowledge-vault:lint detects and removes stale references
  • Reset: /knowledge-vault:agent-reset clears all learned patterns if needed

Lint Checks

/knowledge-vault:lint runs 8 health checks to keep your knowledge base consistent:

Check What it catches Severity
Contradictions Conflicting claims across different sources Critical
Stale articles Concepts not updated after new sources added Warning
Missing concepts Referenced via [[wikilink]] but no article exists Warning
Orphaned articles Concept articles with no sources linked Warning
Thin articles Concept articles under 100 words Suggestion
Duplicates Overlapping concept coverage Warning
Gap analysis Missing topics that would strengthen the knowledge graph Suggestion
Agent staleness agent.md references deleted concepts or sources Warning

Writing Quality

Articles are written to a strict standard -- factual, precise, no fluff.

Rules:

  • Tone: Flat, factual, Wikipedia-style. Let data imply significance.
  • Avoid: Peacock words ("groundbreaking", "revolutionary"), editorial voice ("interestingly"), rhetorical questions
  • Do: One claim per sentence. Short sentences. Replace adjectives with specifics (numbers, dates, methods).
  • Max 2 direct quotes per article -- choose the most impactful

Quality safeguards during compilation:

  • Anti-cramming: If a concept article develops 3+ distinct sub-topics, split into separate articles
  • Anti-thinning: Every article must have real substance -- stubs with 2 vague sentences are failures
  • Quality checkpoints: Every 5 compiled sources, audit the 3 most-updated articles for coherence
  • /knowledge-vault:cleanup: Dedicated command to audit and fix all articles -- restructure diary-style articles into thematic ones, split bloated articles, enrich stubs, fix broken links

Obsidian Frontend

Open .vault/ as an Obsidian vault. Zero configuration needed.

Graph View Visualize concept connections via [[wikilinks]]
Backlinks See every article referencing a concept
Search Full-text search across all articles
Tags Browse by YAML tags across all sources
Web Clipper Clip from browser → auto-lands in Clippings//knowledge-vault:process

What's New in v2

Feature v1 (skill) v2 (plugin)
Architecture Claude Code skill Claude Code plugin with commands, skills, agents, hooks, and scripts
Invocation Natural language (vault compile) Slash commands (/knowledge-vault:compile)
Academic collection Manual URL ingestion only Batch search across 5 research servers via MCP
Source management None /knowledge-vault:setup-sources + sources.json config
Research agent None Dedicated vault-collector agent for parallel database search
Session hooks None Auto-detects .vault/ on session start
Vault format .vault/ directory Same -- fully backward compatible

Summary of changes

  • Plugin architecture: Commands are now registered slash commands, not natural-language triggers. Skills and agents are separate modules.
  • /knowledge-vault:collect: New command. Searches PubMed, arXiv, Scholar Gateway, Consensus, and Paper Search in parallel. Presents results for selective ingestion. Deduplicates across sources.
  • /knowledge-vault:setup-sources: New command. Detects installed MCP servers, shows available servers with ready-to-run install commands, writes sources.json.
  • Session hook: On session start, detects if the project has a .vault/ directory and loads vault context automatically.
  • sources.json: New config file tracking which research servers are configured per vault.

Migration

Upgrading from v1 (skill) to v2 (plugin):

Step 1 -- Remove the old skill:

rm -rf ~/.claude/skills/knowledge-vault

Step 2 -- Install the plugin (in Claude Code):

/plugin marketplace add psypeal/claude-knowledge-vault
/plugin install knowledge-vault@claude-knowledge-vault
/reload-plugins

Step 3 -- Verify (in any project with an existing vault):

> /knowledge-vault:status

That's it. Your existing .vault/ directories are fully compatible. No data migration needed.

Optional -- Configure academic sources for an existing vault:

> /knowledge-vault:setup-sources

Comparison

Knowledge Vault v2 agno-agi/pal
Runtime Claude Code plugin (your terminal) FastAPI + Docker
Storage Markdown + JSON PostgreSQL + files
Setup git clone one folder Docker Compose + API keys
Scope Per-project Global personal agent
Dependencies None (optional: uv, npx for MCP servers) PostgreSQL, OpenAI API
Academic search 5 MCP servers, elastic config Custom API integrations
Invocation /knowledge-vault:* slash commands Chat interface
Browsing Obsidian Custom web UI

Requirements

  • Claude Code v2.0+
  • python3 (for JSON updates in helper scripts)
  • uv (optional, for arXiv and Zotero MCP servers)
  • npx / Node.js (optional, for Paper Search MCP server)
  • Zotero 7+ (optional, for /knowledge-vault:ingest-zotero)
  • Obsidian (optional, for browsing)

Credits

License

MIT

About

A local, LLM-powered knowledge base plugin for Claude Code. Ingest from Zotero, PubMed, arXiv, Scholar, and Consensus; compile a wiki with cross-referenced concepts; query with compounding knowledge. Browse in Obsidian.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors