This guide explains how the AI Knowledge system works, including the compilation architecture, CI/CD pipeline, and RAG indexing.
The AI Knowledge system uses a three-part architecture:
| Operation | Where | When |
|---|---|---|
Knowledge concatenation (all-knowledge.md) |
CI/CD (GitHub Actions) | Every push to source/ |
| Instruction compilation | Local (ragbot compile) |
When instructions change (rare) |
| RAG indexing | Local (ragbot index) |
When content changes + RAG needed |
Key principle: Edit source/ files directly. That is the authoritative content.
The output repo determines what content is included—not who runs the compiler.
Anyone with write access to a repo can compile into it. The content included depends solely on the output repo's position in the inheritance tree.
AI Knowledge repos follow a hierarchy:
ai-knowledge-{templates} <- Public templates (root)
|
ai-knowledge-{person} <- Personal identity
|
ai-knowledge-{company} <- Company knowledge
|
ai-knowledge-{client} <- Client-specific content
Each repo contains:
source/— Human-edited content (authoritative)compiled/— Auto-generated output (instructions only)all-knowledge.md— Auto-generated by CI/CD (at repo root)compile-config.yaml— Compilation settings
| Output | How | Location |
|---|---|---|
compiled/instructions/ |
Local: ragbot compile --project {name} |
compiled/{project}/instructions/ |
all-knowledge.md |
CI/CD: GitHub Actions | Repo root |
Removed outputs (no longer generated):
compiled/knowledge/— Individual flat files were never consumed by anythingcompiled/vectors/— RAG reads source directly, not intermediate chunks
Knowledge concatenation runs automatically via GitHub Actions on every push to source/. The composite action lives at ai-knowledge-ragbot/.github/actions/concatenate-knowledge/action.yml and is called by each repo's concatenate.yml workflow.
The CI/CD pipeline:
- Discovers
.mdfiles insource/, excludinginstructions/,contexts/, andREADME.md - Generates a header with repo name, description, timestamp, and file count
- Concatenates files with
## {relative-path}headers and---separators - Commits and pushes
all-knowledge.mdif changed
No manual action needed. Edit source files, push, and all-knowledge.md updates automatically.
Instruction compilation transforms source instructions into LLM-specific formats. This is rare — only needed when instructions change.
# Compile instructions for a project
ragbot compile --project {name}
# Without LLM API calls (just assemble)
ragbot compile --project {name} --no-llm
# Target specific LLM
ragbot compile --project {name} --llm claude
# Force recompilation
ragbot compile --project {name} --force
# Verbose output
ragbot compile --project {name} --verboseOutput: compiled/{project}/instructions/
claude.md— For Anthropic modelschatgpt.md— For OpenAI modelsgemini.md— For Google models
RAG indexing reads source files directly and indexes them in pgvector. No intermediate files needed.
ragbot index --workspace {name}The RAG pipeline:
- Reads
.mdfiles fromsource/(respecting inheritance) - Chunks content using sentence-transformers
- Generates embeddings with
all-MiniLM-L6-v2 - Upserts to the pgvector store (PostgreSQL with the
pgvectorextension)
Inheritance relationships are defined in my-projects.yaml in your personal repo:
# my-projects.yaml
version: 1
base_path: ~/projects/ai-knowledge
projects:
templates:
local_path: ~/projects/ai-knowledge/ai-knowledge-templates
inherits_from: []
description: Public templates (root)
personal:
local_path: ~/projects/ai-knowledge/ai-knowledge-personal
inherits_from:
- templates
description: Personal identity
company:
local_path: ~/projects/ai-knowledge/ai-knowledge-company
inherits_from:
- personal
description: Company knowledge
client-a:
local_path: ~/projects/ai-knowledge/ai-knowledge-client-a
inherits_from:
- company
description: Client A projectContent included depends on the output repo's position in the inheritance tree:
| Output Repo | Compiling client-a | Content Included |
|---|---|---|
| ai-knowledge-personal | compiled/client-a/ | templates + personal + company + client-a |
| ai-knowledge-company | compiled/client-a/ | templates + company + client-a |
| ai-knowledge-client-a | compiled/client-a/ | templates + client-a |
Private content never leaks. Each repo only contains compilations with content appropriate for that repo's access level.
The compiler generates separate instruction files for each major LLM platform. Each file is optimized for that platform's capabilities and conventions.
When using Ragbot (CLI or Web UI), the correct instruction file is automatically loaded based on the model being used:
| Model Type | Instruction File |
|---|---|
| Anthropic models (Claude) | instructions/claude.md |
| OpenAI models (GPT-5.x) | instructions/chatgpt.md |
| Google models (Gemini) | instructions/gemini.md |
When users switch models mid-conversation in the Web UI, the system automatically loads the appropriate instructions for the new model. This happens transparently on each request.
The RAG system respects the inheritance configuration from my-projects.yaml. When you select a workspace in the UI or CLI, the RAG system:
- Loads inheritance from centralized config — Per ADR-006, inheritance configuration lives ONLY in
my-projects.yamlin the personal repo - Resolves the full inheritance chain — For example,
example-clientinherits fromexample-company->personal->ragbot - Indexes content from all ancestors — The vector index includes chunks from the workspace AND all inherited workspaces
- Enables cross-workspace queries — You can ask about "ragbot" while in a client workspace because that content is inherited
Ragbot implements a production-grade, multi-stage RAG pipeline:
| Phase | Description | Techniques |
|---|---|---|
| Phase 1 | Foundation | Query preprocessing, full document retrieval, 16K context |
| Phase 2 | Query Intelligence | LLM planner, multi-query expansion, HyDE |
| Phase 3 | Hybrid Retrieval | BM25 + Vector search, RRF, LLM reranking |
| Phase 4 | Verification | Hallucination detection, confidence scoring, CRAG |
For complete technical details, see RAG Architecture.
- Claude Code: Reads source/ files directly — no compilation needed
- Claude Projects / ChatGPT / Gemini: Upload all-knowledge.md (one per repo in chain), copy compiled instructions
- Ragbot.AI: Index with
ragbot index --workspace {name}
The inheritance config must exist in your personal repo. Create it with:
version: 1
base_path: ~/projects/ai-knowledge
projects:
# ... your projectsCheck that local_path in my-projects.yaml points to existing directories.
- Check the inheritance chain in my-projects.yaml
- Verify the source repo has content in
source/ - Check
compile-config.yamlinclude/exclude patterns
- RAG Architecture — Complete RAG pipeline documentation
- Data Organization Philosophy — Why separate code from data
- Project Documentation Convention — Project folder structure