RAG Pipeline

Production-ready Retrieval-Augmented Generation with ChromaDB, OpenAI embeddings, and hybrid search.

This is a standalone extraction from my production portfolio site. See it in action at danmonteiro.com.

The Problem

You're building a RAG system but:

Chunking is an afterthought — splitting on token count loses semantic meaning
Pure vector search isn't enough — missing obvious keyword matches
No query understanding — "ML" doesn't find "machine learning" content
Scattered implementation — embedding, storage, and retrieval are disconnected

The Solution

RAG Pipeline provides:

Semantic chunking — split documents by meaning, not arbitrary limits
Hybrid search — vector similarity + keyword boosting
Query expansion — automatic synonym handling (ML ↔ machine learning)
Unified pipeline — embed, store, and retrieve in one clean API

import { Pipeline } from 'rag-pipeline';

const pipeline = new Pipeline();
await pipeline.initialize();

// Index documents
await pipeline.indexDocuments([
  { id: '1', title: 'ML Guide', content: '...', source: 'docs' }
]);

// Query with hybrid search
const result = await pipeline.query({
  question: "How does machine learning work?"
});

console.log(result.context);  // Formatted context for LLM
console.log(result.chunks);   // Retrieved chunks with scores

Results

From production usage:

Metric	Vector Only	Hybrid Search
Recall@5	72%	89%
Exact match boost	0%	+20% score
Synonym coverage	None	Automatic

Design Philosophy

Why Hybrid Search?

Pure vector search has blind spots:

Exact matches matter: If someone asks about "ChromaDB" and you have a doc titled "ChromaDB Setup Guide", that should rank higher—even if the embedding similarity is similar to other database docs.
Acronyms are tricky: "ML" and "machine learning" have different embeddings but mean the same thing. Query expansion catches this.
Precision vs recall tradeoff: Vector search optimizes for semantic similarity. Keyword boost adds precision for obvious matches.

Chunking Strategy

Document → Paragraphs → Sentences (if needed) → Chunks
              ↓
       Respect semantic boundaries
              ↓
       Configurable overlap

The chunking service:

Splits on paragraph boundaries first (semantic units)
Only breaks paragraphs if they exceed max size
Preserves metadata for source attribution

Architecture

┌─────────────────────────────────────────────────────────┐
│                      Pipeline                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐ │
│  │  Chunking   │──│  Embeddings │──│   Vector DB     │ │
│  │  Service    │  │   Service   │  │   (ChromaDB)    │ │
│  └─────────────┘  └─────────────┘  └─────────────────┘ │
│         ↓               ↓                  ↓            │
│  ┌─────────────────────────────────────────────────┐   │
│  │               RAG Service                        │   │
│  │  • Query expansion  • Hybrid search  • Context  │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Quick Start

1. Install

npm install rag-pipeline

2. Start ChromaDB

# Using Docker
docker run -p 8000:8000 chromadb/chroma

# Or install locally
pip install chromadb
chroma run

3. Configure

export OPENAI_API_KEY="sk-..."
export CHROMA_URL="http://localhost:8000"  # Optional, default

4. Use

import { Pipeline } from 'rag-pipeline';

const pipeline = new Pipeline();
await pipeline.initialize();

// Index your documents
await pipeline.indexDocuments([
  {
    id: 'doc-1',
    title: 'Introduction to RAG',
    content: 'Retrieval-Augmented Generation combines...',
    source: 'tutorials',
    tags: ['rag', 'llm', 'ai'],
  },
]);

// Query
const result = await pipeline.query({
  question: 'What is RAG?',
});

// Use the context with your LLM
const prompt = `Context:\n${result.context}\n\nQuestion: What is RAG?`;

API Reference

Pipeline

The main entry point combining all components.

const pipeline = new Pipeline({
  vectorDB: { host: 'http://localhost:8000' },
  embeddings: { model: 'text-embedding-3-large' },
  chunking: { maxWordsPerChunk: 500 },
  rag: { topK: 5, threshold: 0.5 },
});

Methods

Method	Description
`initialize()`	Connect to ChromaDB
`indexDocument(doc)`	Index a single document
`indexDocuments(docs)`	Index multiple documents
`query(query)`	Retrieve relevant chunks
`deleteBySource(source)`	Delete all docs from a source
`clearAll()`	Clear all indexed data
`getStats()`	Get collection statistics

ChunkingService

Semantic document chunking.

import { ChunkingService } from 'rag-pipeline';

const chunker = new ChunkingService({
  maxWordsPerChunk: 500,
  minWordsPerChunk: 50,
  overlapWords: 50,
});

const chunks = chunker.chunkDocument({
  id: '1',
  title: 'My Doc',
  content: '...',
  source: 'docs',
});

EmbeddingService

OpenAI embeddings with dimension reduction.

import { EmbeddingService } from 'rag-pipeline';

const embeddings = new EmbeddingService({
  model: 'text-embedding-3-large',
  dimensions: 1024,  // Optional reduction
});

const vector = await embeddings.embed('Hello world');
const vectors = await embeddings.embedBatch(['Hello', 'World']);

VectorDatabase

ChromaDB wrapper with clean interface.

import { VectorDatabase } from 'rag-pipeline';

const db = new VectorDatabase({
  host: 'http://localhost:8000',
  collectionName: 'my_collection',
  distanceMetric: 'cosine',
});

await db.initialize();
await db.upsertChunks(chunks, embeddings);
const results = await db.similaritySearch(queryEmbedding, 5);

Configuration

Environment Variables

Variable	Required	Default	Description
`OPENAI_API_KEY`	Yes	-	OpenAI API key
`CHROMA_URL`	No	`http://localhost:8000`	ChromaDB URL
`CHROMA_COLLECTION`	No	`knowledge_base`	Collection name
`EMBEDDING_MODEL`	No	`text-embedding-3-large`	Embedding model
`EMBEDDING_DIMENSIONS`	No	Native	Reduced dimensions

RAG Configuration

const pipeline = new Pipeline({
  rag: {
    topK: 5,                    // Chunks to retrieve
    threshold: 0.5,             // Minimum similarity
    enableQueryExpansion: true, // Synonym handling
    enableHybridSearch: true,   // Keyword boosting
    synonymMap: {               // Custom synonyms
      'k8s': ['kubernetes', 'Kubernetes'],
    },
  },
});

Project Structure

rag-pipeline/
├── src/
│   ├── index.ts        # Main exports + Pipeline class
│   ├── vector-db.ts    # ChromaDB integration
│   ├── embeddings.ts   # OpenAI embeddings
│   ├── chunking.ts     # Semantic chunking
│   └── rag.ts          # RAG service (retrieval logic)
├── examples/
│   └── basic-usage.ts
├── docs/
│   └── architecture.md
└── README.md

Advanced Usage

Custom Chunking

import { ChunkingService, DocumentInput } from 'rag-pipeline';

const chunker = new ChunkingService();

// Chunk by markdown sections
const chunks = chunker.chunkBySections({
  id: '1',
  title: 'Guide',
  content: '## Section 1\n...\n## Section 2\n...',
  source: 'docs',
});

// Create overview chunk
const overview = chunker.createOverviewChunk(doc, [
  { value: doc.title },
  { label: 'Category', value: doc.category },
  { label: 'Tags', value: doc.tags?.join(', ') },
]);

Direct Component Access

const pipeline = new Pipeline();
await pipeline.initialize();

// Get individual services
const embeddings = pipeline.getEmbeddingService();
const vectorDB = pipeline.getVectorDB();
const chunking = pipeline.getChunkingService();
const rag = pipeline.getRAGService();

// Use directly
const vector = await embeddings.embed('test');

Custom Filters

// Filter by source
const result = await pipeline.query({
  question: 'How to deploy?',
  filters: { source: 'tutorials' },
});

// Filter by category
const result = await pipeline.query({
  question: 'What is RAG?',
  filters: { category: 'AI' },
});

Part of the Context Continuity Stack

This repo is one layer in a broader approach to context continuity — giving AI systems the right context at the right time.

Layer	Role	This Repo
Intra-session	Short-term memory	—
Document-scoped	Injected content	—
Retrieved	Long-term semantic memory	rag-pipeline
Progressive	Staged responses	—

RAG provides the persistent memory layer — documents indexed once, retrieved on demand. Combined with session caching and document injection, it creates seamless context continuity for users.

Related repos:

chatbot-widget — Session cache, Research Mode, conversation export
mcp-rag-server — This pipeline as MCP tools
ai-orchestrator — Complexity-based model routing

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feat/add-new-feature)
Make changes with semantic commits
Open a PR with clear description

License

MIT License - see LICENSE for details.

Acknowledgments

Built with Claude Code.

Co-Authored-By: Claude <noreply@anthropic.com>

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

RAG Pipeline

The Problem

The Solution

Results

Design Philosophy

Why Hybrid Search?

Chunking Strategy

Architecture

Quick Start

1. Install

2. Start ChromaDB

3. Configure

4. Use

API Reference

Pipeline

Methods

ChunkingService

EmbeddingService

VectorDatabase

Configuration

Environment Variables

RAG Configuration

Project Structure

Advanced Usage

Custom Chunking

Direct Component Access

Custom Filters

Part of the Context Continuity Stack

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages