Skip to content

Latest commit

 

History

History
225 lines (157 loc) · 8.08 KB

File metadata and controls

225 lines (157 loc) · 8.08 KB

AI Knowledge Compilation Guide

This guide explains how the AI Knowledge system works, including the compilation architecture, CI/CD pipeline, and RAG indexing.

Architecture Overview

The AI Knowledge system uses a three-part architecture:

Operation Where When
Knowledge concatenation (all-knowledge.md) CI/CD (GitHub Actions) Every push to source/
Instruction compilation Local (ragbot compile) When instructions change (rare)
RAG indexing Local (ragbot index) When content changes + RAG needed

Key principle: Edit source/ files directly. That is the authoritative content.

Core Concept: Output Repo Determines Content

The output repo determines what content is included—not who runs the compiler.

Anyone with write access to a repo can compile into it. The content included depends solely on the output repo's position in the inheritance tree.

Repository Types

AI Knowledge repos follow a hierarchy:

ai-knowledge-{templates}     <- Public templates (root)
    |
ai-knowledge-{person}        <- Personal identity
    |
ai-knowledge-{company}       <- Company knowledge
    |
ai-knowledge-{client}        <- Client-specific content

Each repo contains:

  • source/ — Human-edited content (authoritative)
  • compiled/ — Auto-generated output (instructions only)
  • all-knowledge.md — Auto-generated by CI/CD (at repo root)
  • compile-config.yaml — Compilation settings

What Gets Generated

Output How Location
compiled/instructions/ Local: ragbot compile --project {name} compiled/{project}/instructions/
all-knowledge.md CI/CD: GitHub Actions Repo root

Removed outputs (no longer generated):

  • compiled/knowledge/ — Individual flat files were never consumed by anything
  • compiled/vectors/ — RAG reads source directly, not intermediate chunks

Knowledge Concatenation (CI/CD)

Knowledge concatenation runs automatically via GitHub Actions on every push to source/. The composite action lives at ai-knowledge-ragbot/.github/actions/concatenate-knowledge/action.yml and is called by each repo's concatenate.yml workflow.

The CI/CD pipeline:

  1. Discovers .md files in source/, excluding instructions/, contexts/, and README.md
  2. Generates a header with repo name, description, timestamp, and file count
  3. Concatenates files with ## {relative-path} headers and --- separators
  4. Commits and pushes all-knowledge.md if changed

No manual action needed. Edit source files, push, and all-knowledge.md updates automatically.

Instruction Compilation (Local)

Instruction compilation transforms source instructions into LLM-specific formats. This is rare — only needed when instructions change.

# Compile instructions for a project
ragbot compile --project {name}

# Without LLM API calls (just assemble)
ragbot compile --project {name} --no-llm

# Target specific LLM
ragbot compile --project {name} --llm claude

# Force recompilation
ragbot compile --project {name} --force

# Verbose output
ragbot compile --project {name} --verbose

Output: compiled/{project}/instructions/

  • claude.md — For Anthropic models
  • chatgpt.md — For OpenAI models
  • gemini.md — For Google models

RAG Indexing (Local)

RAG indexing reads source files directly and indexes them in pgvector. No intermediate files needed.

ragbot index --workspace {name}

The RAG pipeline:

  1. Reads .md files from source/ (respecting inheritance)
  2. Chunks content using sentence-transformers
  3. Generates embeddings with all-MiniLM-L6-v2
  4. Upserts to the pgvector store (PostgreSQL with the pgvector extension)

Inheritance Configuration

Inheritance relationships are defined in my-projects.yaml in your personal repo:

# my-projects.yaml
version: 1
base_path: ~/projects/ai-knowledge

projects:
  templates:
    local_path: ~/projects/ai-knowledge/ai-knowledge-templates
    inherits_from: []
    description: Public templates (root)

  personal:
    local_path: ~/projects/ai-knowledge/ai-knowledge-personal
    inherits_from:
      - templates
    description: Personal identity

  company:
    local_path: ~/projects/ai-knowledge/ai-knowledge-company
    inherits_from:
      - personal
    description: Company knowledge

  client-a:
    local_path: ~/projects/ai-knowledge/ai-knowledge-client-a
    inherits_from:
      - company
    description: Client A project

Privacy Model

Content included depends on the output repo's position in the inheritance tree:

Output Repo Compiling client-a Content Included
ai-knowledge-personal compiled/client-a/ templates + personal + company + client-a
ai-knowledge-company compiled/client-a/ templates + company + client-a
ai-knowledge-client-a compiled/client-a/ templates + client-a

Private content never leaks. Each repo only contains compilations with content appropriate for that repo's access level.

LLM-Specific Instructions

The compiler generates separate instruction files for each major LLM platform. Each file is optimized for that platform's capabilities and conventions.

Automatic Instruction Selection

When using Ragbot (CLI or Web UI), the correct instruction file is automatically loaded based on the model being used:

Model Type Instruction File
Anthropic models (Claude) instructions/claude.md
OpenAI models (GPT-5.x) instructions/chatgpt.md
Google models (Gemini) instructions/gemini.md

Mid-Conversation Model Switching

When users switch models mid-conversation in the Web UI, the system automatically loads the appropriate instructions for the new model. This happens transparently on each request.

RAG and Inheritance

The RAG system respects the inheritance configuration from my-projects.yaml. When you select a workspace in the UI or CLI, the RAG system:

  1. Loads inheritance from centralized config — Per ADR-006, inheritance configuration lives ONLY in my-projects.yaml in the personal repo
  2. Resolves the full inheritance chain — For example, example-client inherits from example-company -> personal -> ragbot
  3. Indexes content from all ancestors — The vector index includes chunks from the workspace AND all inherited workspaces
  4. Enables cross-workspace queries — You can ask about "ragbot" while in a client workspace because that content is inherited

RAG Pipeline Architecture

Ragbot implements a production-grade, multi-stage RAG pipeline:

Phase Description Techniques
Phase 1 Foundation Query preprocessing, full document retrieval, 16K context
Phase 2 Query Intelligence LLM planner, multi-query expansion, HyDE
Phase 3 Hybrid Retrieval BM25 + Vector search, RRF, LLM reranking
Phase 4 Verification Hallucination detection, confidence scoring, CRAG

For complete technical details, see RAG Architecture.

Usage

  • Claude Code: Reads source/ files directly — no compilation needed
  • Claude Projects / ChatGPT / Gemini: Upload all-knowledge.md (one per repo in chain), copy compiled instructions
  • Ragbot.AI: Index with ragbot index --workspace {name}

Troubleshooting

"No my-projects.yaml found"

The inheritance config must exist in your personal repo. Create it with:

version: 1
base_path: ~/projects/ai-knowledge
projects:
  # ... your projects

"Repository not found"

Check that local_path in my-projects.yaml points to existing directories.

Content not appearing in output

  1. Check the inheritance chain in my-projects.yaml
  2. Verify the source repo has content in source/
  3. Check compile-config.yaml include/exclude patterns

Further Reading