Data Organization Philosophy

Why Ragbot Separates Code from Data

Ragbot follows a fundamental principle in software engineering: separation of concerns. Your personal data should never be mixed with application code. This isn't just a best practice—it's essential for privacy, security, and flexibility.

The AI Knowledge Architecture

Ragbot uses a multi-repository architecture where each context (personal, company, client) has its own repository:

ragbot/                          # Application code (public)
├── src/ragbot/                  # Core library
├── web/                         # React frontend
└── api/                         # FastAPI backend

ai-knowledge-{name}/             # Content repositories (private)
├── source/                      # Human-edited content (authoritative)
│   ├── instructions/            # WHO - identity and persona
│   ├── runbooks/                # HOW - task procedures
│   └── datasets/                # WHAT - reference knowledge
├── compiled/                    # Auto-generated output
│   └── {project}/
│       └── instructions/        # LLM-specific instructions (local compilation)
├── all-knowledge.md             # Concatenated knowledge (CI/CD via GitHub Actions)
└── compile-config.yaml          # Compilation settings

The WHO/HOW/WHAT Framework

AI knowledge is organized into three conceptual categories:

Folder	Purpose	Question Answered
`instructions/`	Identity and persona	WHO is the agent?
`runbooks/`	Task procedures	HOW does the agent do things?
`datasets/`	Reference knowledge	WHAT does the agent know?

instructions/ — WHO

System-level behavioral guidance that defines the agent's identity:

Communication style and tone
Core principles and values
Response preferences

runbooks/ — HOW

Task-specific procedures for autonomous AI execution:

Content creation guidelines
Automation workflows
Prompting techniques

datasets/ — WHAT

Reference knowledge organized by category:

Personal information
Professional background
Domain expertise

Repository Inheritance

AI Knowledge repos follow a hierarchy with inheritance:

ai-knowledge-{templates}     ← Public templates (root)
    ↓
ai-knowledge-{person}        ← Personal identity
    ↓
ai-knowledge-{company}       ← Company knowledge
    ↓
ai-knowledge-{client}        ← Client-specific content

Each child repo inherits content from its parent. This enables:

Layered identity: Personal context + company context + client context
Privacy control: Each repo only contains content appropriate for its access level
Flexible compilation: Compile with or without inheritance

The Philosophy

Think of Ragbot Like Your Operating System

Just as your operating system (macOS, Linux, Windows) separates:

System files (the OS itself) from user files (your documents)
Applications (software) from data (what you create)
Configuration (settings) from secrets (passwords)

Ragbot separates:

Application code (ragbot/) from your knowledge (ai-knowledge-*/)
The AI engine from your context and identity
Generic examples from personal information

Real-World Analogies

The Library Analogy:

Ragbot is the librarian (constant, helpful, knowledgeable about systems)
Your ai-knowledge repos are the books on the shelves (your unique knowledge)
Instructions are how you want the librarian to help you (your preferences)

The Assistant Analogy:

Ragbot is your assistant (the person with skills and tools)
Your ai-knowledge content is the briefing materials (context about your life/work)
Instructions are the working relationship (how you collaborate)

Benefits of This Approach

1. Privacy and Security

What stays private:

Your personal information (in private ai-knowledge repos)
Work/client data (in separate repos per client)
Your AI instructions (your "secret sauce")

What's public:

The application code (open source)
Generic examples and templates
Prompting techniques and frameworks

2. Flexibility

Multiple Contexts:

Personal repo for personal use
Company repo for work projects
Client repos for client-specific work
Each compiled independently or with inheritance

Version Control:

Update Ragbot code without affecting your knowledge
Rollback knowledge changes independently
Branch knowledge for experiments
Share repos selectively

3. Portability

Your knowledge travels with you:

Same repos work on any machine
Easy backup (git push to remote)
Migrate to new machine (git clone)
Share setup without sharing content

4. Collaboration

You can share:

The application (public ragbot repo)
Generic templates (examples directory)
Anonymized techniques

You keep private:

Personal data
Client information
Your customizations

Comparison with Other Approaches

Approach 1: Everything in One Repo

ragbot/
├── src/
├── my-personal-data/    # DANGER: Easy to accidentally commit
├── my-instructions/
└── client-secrets/      # DANGER: Might leak

Problems:

High risk of committing sensitive data
Can't share code without exposing data
One .gitignore mistake = privacy breach

Approach 2: Application Only (Generic AI)

ragbot/
└── src/    # No customization capability

Problems:

AI has no context about you
Repeat yourself in every conversation
Generic, not personalized responses

Approach 3: Multi-Repo Architecture (Ragbot's Approach)

ragbot/ (public)                ai-knowledge-*/ (private)
├── src/                        ├── source/
├── web/                        │   ├── instructions/
├── api/                        │   ├── runbooks/
└── examples/                   │   └── datasets/
                                ├── compiled/
                                └── compile-config.yaml

Benefits:

Clear separation of concerns
Privacy by design
Flexible inheritance model
Easy to share application, not data

Implementation Guide

For New Users

Quick Start:

# 1. Clone Ragbot
git clone https://github.com/synthesisengineering/ragbot.git
cd ragbot
pip install -e .

# 2. Create your personal ai-knowledge repo
mkdir -p ~/ai-knowledge/ai-knowledge-personal
cd ~/ai-knowledge/ai-knowledge-personal
mkdir -p source/instructions source/runbooks source/datasets

# 3. Add your content
# Edit files in source/

# 4. Compile
ragbot compile --repo ~/ai-knowledge/ai-knowledge-personal

# 5. Chat
ragbot chat --workspace personal

For Advanced Users

Multiple Contexts with Inheritance:

# my-projects.yaml in your personal repo
version: 1
projects:
  personal:
    local_path: ~/ai-knowledge/ai-knowledge-personal
    inherits_from: []

  company:
    local_path: ~/ai-knowledge/ai-knowledge-company
    inherits_from:
      - personal

  client-a:
    local_path: ~/ai-knowledge/ai-knowledge-client-a
    inherits_from:
      - company

Compile with inheritance:

ragbot compile --project client-a --with-inheritance

Best Practices

DO

Keep repos separate - One repo per context (personal, company, client)
Use inheritance wisely - Personal → Company → Client hierarchy
Version control your knowledge - Git provides history and backup

Use the WHO/HOW/WHAT structure

source/
├── instructions/       # WHO
├── runbooks/           # HOW
└── datasets/           # WHAT

Edit source/ directly - Knowledge concatenation is automatic via CI/CD

DON'T

Don't commit secrets - No API keys in content files
Don't mix public and private - Keep ragbot/ and ai-knowledge-*/ separate
Don't skip CI/CD - all-knowledge.md is auto-generated by GitHub Actions

Summary

Ragbot's separation of code and data follows proven patterns from:

Unix dotfiles
Infrastructure as Code
Twelve-Factor App methodology
Security best practices

The result:

Privacy by design
Flexibility for multiple contexts
Inheritance for layered identity
Easy to update and maintain

Bottom line: Your knowledge is yours. The application is shared. This separation keeps both better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Organization Philosophy

Why Ragbot Separates Code from Data

The AI Knowledge Architecture

The WHO/HOW/WHAT Framework

instructions/ — WHO

runbooks/ — HOW

datasets/ — WHAT

Repository Inheritance

The Philosophy

Think of Ragbot Like Your Operating System

Real-World Analogies

Benefits of This Approach

1. Privacy and Security

2. Flexibility

3. Portability

4. Collaboration

Comparison with Other Approaches

Approach 1: Everything in One Repo

Approach 2: Application Only (Generic AI)

Approach 3: Multi-Repo Architecture (Ragbot's Approach)

Implementation Guide

For New Users

For Advanced Users

Best Practices

DO

DON'T

Summary

Further Reading

FilesExpand file tree

data-organization.md

Latest commit

History

data-organization.md

File metadata and controls

Data Organization Philosophy

Why Ragbot Separates Code from Data

The AI Knowledge Architecture

The WHO/HOW/WHAT Framework

instructions/ — WHO

runbooks/ — HOW

datasets/ — WHAT

Repository Inheritance

The Philosophy

Think of Ragbot Like Your Operating System

Real-World Analogies

Benefits of This Approach

1. Privacy and Security

2. Flexibility

3. Portability

4. Collaboration

Comparison with Other Approaches

Approach 1: Everything in One Repo

Approach 2: Application Only (Generic AI)

Approach 3: Multi-Repo Architecture (Ragbot's Approach)

Implementation Guide

For New Users

For Advanced Users

Best Practices

DO

DON'T

Summary

Further Reading