AI Knowledge Compilation Guide

This guide explains how the AI Knowledge system works, including the compilation architecture, CI/CD pipeline, and RAG indexing.

Architecture Overview

The AI Knowledge system uses a three-part architecture:

Operation	Where	When
Knowledge concatenation (`all-knowledge.md`)	CI/CD (GitHub Actions)	Every push to `source/`
Instruction compilation	Local (`ragbot compile`)	When instructions change (rare)
RAG indexing	Local (`ragbot index`)	When content changes + RAG needed

Key principle: Edit source/ files directly. That is the authoritative content.

Core Concept: Output Repo Determines Content

The output repo determines what content is included—not who runs the compiler.

Anyone with write access to a repo can compile into it. The content included depends solely on the output repo's position in the inheritance tree.

Repository Types

AI Knowledge repos follow a hierarchy:

ai-knowledge-{templates}     <- Public templates (root)
    |
ai-knowledge-{person}        <- Personal identity
    |
ai-knowledge-{company}       <- Company knowledge
    |
ai-knowledge-{client}        <- Client-specific content

Each repo contains:

source/ — Human-edited content (authoritative)
compiled/ — Auto-generated output (instructions only)
all-knowledge.md — Auto-generated by CI/CD (at repo root)
compile-config.yaml — Compilation settings

What Gets Generated

Output	How	Location
`compiled/instructions/`	Local: `ragbot compile --project {name}`	`compiled/{project}/instructions/`
`all-knowledge.md`	CI/CD: GitHub Actions	Repo root

Removed outputs (no longer generated):

compiled/knowledge/ — Individual flat files were never consumed by anything
compiled/vectors/ — RAG reads source directly, not intermediate chunks

Knowledge Concatenation (CI/CD)

Knowledge concatenation runs automatically via GitHub Actions on every push to source/. The composite action lives at ai-knowledge-ragbot/.github/actions/concatenate-knowledge/action.yml and is called by each repo's concatenate.yml workflow.

The CI/CD pipeline:

Discovers .md files in source/, excluding instructions/, contexts/, and README.md
Generates a header with repo name, description, timestamp, and file count
Concatenates files with ## {relative-path} headers and --- separators
Commits and pushes all-knowledge.md if changed

No manual action needed. Edit source files, push, and all-knowledge.md updates automatically.

Instruction Compilation (Local)

Instruction compilation transforms source instructions into LLM-specific formats. This is rare — only needed when instructions change.

# Compile instructions for a project
ragbot compile --project {name}

# Without LLM API calls (just assemble)
ragbot compile --project {name} --no-llm

# Target specific LLM
ragbot compile --project {name} --llm claude

# Force recompilation
ragbot compile --project {name} --force

# Verbose output
ragbot compile --project {name} --verbose

Output: compiled/{project}/instructions/

claude.md — For Anthropic models
chatgpt.md — For OpenAI models
gemini.md — For Google models

RAG Indexing (Local)

RAG indexing reads source files directly and indexes them in pgvector. No intermediate files needed.

ragbot index --workspace {name}

The RAG pipeline:

Reads .md files from source/ (respecting inheritance)
Chunks content using sentence-transformers
Generates embeddings with all-MiniLM-L6-v2
Upserts to the pgvector store (PostgreSQL with the pgvector extension)

Inheritance Configuration

Inheritance relationships are defined in my-projects.yaml in your personal repo:

# my-projects.yaml
version: 1
base_path: ~/projects/ai-knowledge

projects:
  templates:
    local_path: ~/projects/ai-knowledge/ai-knowledge-templates
    inherits_from: []
    description: Public templates (root)

  personal:
    local_path: ~/projects/ai-knowledge/ai-knowledge-personal
    inherits_from:
      - templates
    description: Personal identity

  company:
    local_path: ~/projects/ai-knowledge/ai-knowledge-company
    inherits_from:
      - personal
    description: Company knowledge

  client-a:
    local_path: ~/projects/ai-knowledge/ai-knowledge-client-a
    inherits_from:
      - company
    description: Client A project

Privacy Model

Content included depends on the output repo's position in the inheritance tree:

Output Repo	Compiling client-a	Content Included
ai-knowledge-personal	compiled/client-a/	templates + personal + company + client-a
ai-knowledge-company	compiled/client-a/	templates + company + client-a
ai-knowledge-client-a	compiled/client-a/	templates + client-a

Private content never leaks. Each repo only contains compilations with content appropriate for that repo's access level.

LLM-Specific Instructions

The compiler generates separate instruction files for each major LLM platform. Each file is optimized for that platform's capabilities and conventions.

Automatic Instruction Selection

When using Ragbot (CLI or Web UI), the correct instruction file is automatically loaded based on the model being used:

Model Type	Instruction File
Anthropic models (Claude)	`instructions/claude.md`
OpenAI models (GPT-5.x)	`instructions/chatgpt.md`
Google models (Gemini)	`instructions/gemini.md`

Mid-Conversation Model Switching

When users switch models mid-conversation in the Web UI, the system automatically loads the appropriate instructions for the new model. This happens transparently on each request.

RAG and Inheritance

The RAG system respects the inheritance configuration from my-projects.yaml. When you select a workspace in the UI or CLI, the RAG system:

Loads inheritance from centralized config — Per ADR-006, inheritance configuration lives ONLY in my-projects.yaml in the personal repo
Resolves the full inheritance chain — For example, example-client inherits from example-company -> personal -> ragbot
Indexes content from all ancestors — The vector index includes chunks from the workspace AND all inherited workspaces
Enables cross-workspace queries — You can ask about "ragbot" while in a client workspace because that content is inherited

RAG Pipeline Architecture

Ragbot implements a production-grade, multi-stage RAG pipeline:

Phase	Description	Techniques
Phase 1	Foundation	Query preprocessing, full document retrieval, 16K context
Phase 2	Query Intelligence	LLM planner, multi-query expansion, HyDE
Phase 3	Hybrid Retrieval	BM25 + Vector search, RRF, LLM reranking
Phase 4	Verification	Hallucination detection, confidence scoring, CRAG

For complete technical details, see RAG Architecture.

Usage

Claude Code: Reads source/ files directly — no compilation needed
Claude Projects / ChatGPT / Gemini: Upload all-knowledge.md (one per repo in chain), copy compiled instructions
Ragbot.AI: Index with ragbot index --workspace {name}

Troubleshooting

"No my-projects.yaml found"

The inheritance config must exist in your personal repo. Create it with:

version: 1
base_path: ~/projects/ai-knowledge
projects:
  # ... your projects

"Repository not found"

Check that local_path in my-projects.yaml points to existing directories.

Content not appearing in output

Check the inheritance chain in my-projects.yaml
Verify the source repo has content in source/
Check compile-config.yaml include/exclude patterns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Knowledge Compilation Guide

Architecture Overview

Core Concept: Output Repo Determines Content

Repository Types

What Gets Generated

Knowledge Concatenation (CI/CD)

Instruction Compilation (Local)

RAG Indexing (Local)

Inheritance Configuration

Privacy Model

LLM-Specific Instructions

Automatic Instruction Selection

Mid-Conversation Model Switching

RAG and Inheritance

RAG Pipeline Architecture

Usage

Troubleshooting

"No my-projects.yaml found"

"Repository not found"

Content not appearing in output

Further Reading

FilesExpand file tree

compilation-guide.md

Latest commit

History

compilation-guide.md

File metadata and controls

AI Knowledge Compilation Guide

Architecture Overview

Core Concept: Output Repo Determines Content

Repository Types

What Gets Generated

Knowledge Concatenation (CI/CD)

Instruction Compilation (Local)

RAG Indexing (Local)

Inheritance Configuration

Privacy Model

LLM-Specific Instructions

Automatic Instruction Selection

Mid-Conversation Model Switching

RAG and Inheritance

RAG Pipeline Architecture

Usage

Troubleshooting

"No my-projects.yaml found"

"Repository not found"

Content not appearing in output

Further Reading