Skip to content

aryalaadi/genragrs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenRAGrs 🦀

A modular Rust library for Retrieval-Augmented Generation (RAG) with Ollama integration

GenRAGrs is a production-ready, modular RAG library designed for building intelligent applications that can understand and query your documents. It combines vector similarity search with large language models to provide contextually accurate responses based on your data.

Features

  • 🚀 Ollama Integration: Seamless integration with local Ollama models
  • 🧩 Modular Architecture: Pluggable components for embeddings, storage, retrieval, and chat
  • 📚 Multi-format Support: Text files, Markdown, and code files
  • 💬 Smart Chunking: Intelligent document splitting with configurable overlap
  • 🔍 Advanced Retrieval: Semantic search with optional reranking and hybrid retrieval
  • 🎯 Flexible Prompting: Customizable prompt templates for different use cases
  • Async Performance: Built with Tokio for high-performance async operations
  • 🛠️ CLI Tool: Ready-to-use command-line interface

Quick Start

Prerequisites

Installation on Ubuntu

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull required models
ollama pull nomic-embed-text  # For embeddings
ollama pull qwen3:0.6b       # For chat (default model)

# Clone and build GenRAGrs
git clone <your-repo-url>
cd genragrs
cargo build --release

Running the Example

# Process documents and start interactive chat
cargo run --example rag_cli -- --rag-files ./README.md --rag-files ./src --recursive

# With specific file types
cargo run --example rag_cli -- --rag-files ./docs --recursive --extensions md,txt,rs

# Using different models
cargo run --example rag_cli -- --rag-files ./docs --chat-model llama2:7b --embed-model nomic-embed-text

# Show help
cargo run --example rag_cli -- --help

Example Chat Session

🚀 Welcome to RAG Chat!
💬 Ask questions about your documents
📋 Commands: 'exit'/'quit' to exit, 'help' for help, 'context <question>' to see sources
======================================================================

💬 You: What is GenRAGrs?
🤖 Assistant: GenRAGrs is a modular Rust library for Retrieval-Augmented Generation (RAG) with Ollama integration. It's designed to be production-ready and allows you to build intelligent applications that can understand and query your documents...

💬 You: context What is GenRAGrs?
📚 Context for: What is GenRAGrs?
==================================================
📄 Source 1: README.md
🎯 Score: 0.923
📝 Content: GenRAGrs is a modular Rust library for Retrieval-Augmented Generation (RAG) with Ollama integration. It combines vector similarity search with large language models...
------------------------------

💬 You: exit
👋 Goodbye!

Library Documentation

Core Architecture

GenRAGrs follows a modular design with these key components:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Documents     │───▶│   Embeddings    │───▶│  Vector Store   │
│                 │    │                 │    │                 │
│ • TextLoader    │    │ • OllamaEmbedder│    │ • InMemoryStore │
│ • MarkdownLoader│    │ • Batch Support │    │ • Cosine Sim    │
│ • TextSplitter  │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                                        │
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│      Chat       │◀───│   Retriever     │◀───│                 │
│                 │    │                 │    │                 │
│ • SimpleChat    │    │ • SimpleRetriever│    │                 │
│ • Orchestrator  │    │ • HybridRetriever│    │                 │
│ • Sessions      │    │ • Reranking     │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │
         ▼
┌─────────────────┐    ┌─────────────────┐
│    Models       │    │    Prompts      │
│                 │    │                 │
│ • OllamaModel   │    │ • Templates     │
│ • ModelConfig   │    │ • QA, Code, etc │
│ • MockModel     │    │ • Custom vars   │
└─────────────────┘    └─────────────────┘

Basic Usage

Simple RAG System

use genragrs::prelude::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize with default configuration
    let mut rag = RagSystem::default().await?;
    
    // Add documents
    rag.add_text_document("Rust is a systems programming language.").await?;
    rag.add_text_file("path/to/document.txt").await?;
    rag.add_markdown_file("README.md").await?;
    
    // Query the system
    let response = rag.query("What is Rust?").await?;
    println!("Answer: {}", response);
    
    // Start a conversation
    let response = rag.chat("Tell me more about systems programming").await?;
    println!("Response: {}", response);
    
    Ok(())
}

Custom Configuration

use genragrs::prelude::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = RagConfig::new()
        .with_ollama_url("http://localhost:11434".to_string())
        .with_chat_model("llama2:7b".to_string())
        .with_embedding_model("nomic-embed-text".to_string())
        .with_chunk_settings(512, 100)
        .with_prompt_template(PromptTemplates::code_assistant_template());
    
    let mut rag = RagSystem::new(config).await?;
    
    // Rest of your code...
    Ok(())
}

API Reference

Core Types

RagSystem

The main entry point for the RAG functionality.

impl RagSystem {
    // Construction
    pub async fn new(config: RagConfig) -> Result<Self>;
    pub async fn default() -> Result<Self>;
    
    // Document management
    pub async fn add_text_document(&mut self, content: &str) -> Result<()>;
    pub async fn add_text_file(&mut self, file_path: &str) -> Result<()>;
    pub async fn add_markdown_file(&mut self, file_path: &str) -> Result<()>;
    pub async fn clear(&mut self) -> Result<()>;
    pub async fn has_documents(&self) -> bool;
    
    // Querying
    pub async fn query(&self, question: &str) -> Result<String>;
    pub async fn chat(&self, message: &str) -> Result<String>;
    pub async fn get_context(&self, query: &str) -> Result<Vec<SearchResult>>;
}
RagConfig

Configuration for the RAG system.

#[derive(Debug, Clone)]
pub struct RagConfig {
    pub ollama_base_url: String,
    pub embedding_model: String,
    pub chat_model: String,
    pub chunk_size: usize,
    pub chunk_overlap: usize,
    pub retrieval_config: RetrievalConfig,
    pub model_config: ModelConfig,
    pub prompt_template: PromptTemplate,
}

impl RagConfig {
    pub fn new() -> Self;
    pub fn with_ollama_url(self, url: String) -> Self;
    pub fn with_embedding_model(self, model: String) -> Self;
    pub fn with_chat_model(self, model: String) -> Self;
    pub fn with_chunk_settings(self, size: usize, overlap: usize) -> Self;
    pub fn with_prompt_template(self, template: PromptTemplate) -> Self;
    pub fn with_retrieval_config(self, config: RetrievalConfig) -> Self;
}

Document Processing

TextSplitter

Intelligent text chunking with overlap support.

impl TextSplitter {
    pub fn new(chunk_size: usize, chunk_overlap: usize) -> Self;
    pub fn with_separators(self, separators: Vec<String>) -> Self;
    pub fn split_text(&self, text: &str) -> Vec<String>;
}
Document Loaders
// Text files
let loader = TextFileLoader::new(Some(text_splitter));
let documents = loader.load("path/to/file.txt").await?;

// Markdown files with header extraction
let loader = MarkdownLoader::new(Some(text_splitter));
let documents = loader.load("path/to/README.md").await?;

Embeddings

OllamaEmbedder
impl OllamaEmbedder {
    pub fn new(base_url: Option<String>, model: Option<String>) -> Self;
    pub fn with_model(self, model: String) -> Self;
}

#[async_trait]
impl Embedder for OllamaEmbedder {
    async fn embed(&self, text: &str) -> Result<Embedding>;
    async fn embed_batch(&self, texts: &[String]) -> Result<Vec<Embedding>>;
}

Vector Storage

InMemoryVectorStore
impl InMemoryVectorStore {
    pub fn new() -> Self;
    pub fn is_empty(&self) -> bool;
    pub fn len(&self) -> usize;
}

#[async_trait]
impl VectorStore for InMemoryVectorStore {
    async fn add_document(&mut self, document: Document) -> Result<()>;
    async fn add_documents(&mut self, documents: Vec<Document>) -> Result<()>;
    async fn search(&self, query_embedding: &Embedding, top_k: usize) -> Result<Vec<SearchResult>>;
    async fn get_document(&self, id: &str) -> Result<Option<Document>>;
    async fn delete_document(&mut self, id: &str) -> Result<bool>;
    async fn clear(&mut self) -> Result<()>;
}

Retrieval

RetrievalConfig
#[derive(Debug, Clone)]
pub struct RetrievalConfig {
    pub top_k: usize,        // Number of documents to retrieve (default: 5)
    pub score_threshold: Option<f32>,  // Minimum similarity score
    pub rerank: bool,        // Enable reranking (default: false)
}
Retrievers
// Simple semantic retrieval
let retriever = SimpleRetriever::new(embedder, vector_store, Some(config));

// Hybrid retrieval (semantic + keyword)
let hybrid_retriever = HybridRetriever::new(semantic_retriever, Some(0.3));

Prompt Templates

Built-in Templates
// Q&A template (default)
let template = PromptTemplates::qa_template();

// Code assistant
let template = PromptTemplates::code_assistant_template();

// Research assistant
let template = PromptTemplates::research_template();

// Summarization
let template = PromptTemplates::summarization_template();
Custom Templates
let template = PromptTemplate::new()
    .with_system_prompt("You are a helpful assistant.".to_string())
    .with_user_template("Context: {context}\n\nQuestion: {question}\n\nAnswer:".to_string())
    .with_context_template("Source: {source}\n{content}".to_string());
Prompt Builder with Variables
let builder = PromptBuilder::new(template)
    .set_variable("domain".to_string(), "medical".to_string())
    .set_variable("style".to_string(), "formal".to_string());

let messages = builder.build("What is diabetes?", &search_results)?;

Language Models

OllamaModel
impl OllamaModel {
    pub fn new(base_url: Option<String>, default_config: Option<ModelConfig>) -> Self;
    pub fn with_model(self, model_name: String) -> Self;
}

#[async_trait]
impl LanguageModel for OllamaModel {
    async fn generate(&self, messages: ChatMessages) -> Result<ModelResponse>;
    async fn generate_with_config(&self, messages: ChatMessages, config: &ModelConfig) -> Result<ModelResponse>;
}
ModelConfig
#[derive(Debug, Clone)]
pub struct ModelConfig {
    pub model_name: String,
    pub temperature: Option<f32>,
    pub max_tokens: Option<u32>,
    pub top_p: Option<f32>,
    pub stream: bool,
}

impl ModelConfig {
    pub fn new(model_name: String) -> Self;
    pub fn with_temperature(self, temperature: f32) -> Self;
    pub fn with_max_tokens(self, max_tokens: u32) -> Self;
    pub fn with_streaming(self, stream: bool) -> Self;
}

Chat Management

SimpleChat
impl SimpleChat {
    pub fn new(retriever: Arc<dyn Retriever>, model: Arc<dyn LanguageModel>, config: Option<ChatConfig>) -> Self;
    pub async fn ask(&self, question: &str) -> Result<String>;
    pub async fn chat(&self, message: &str) -> Result<String>;
    pub async fn get_context(&self, query: &str) -> Result<Vec<SearchResult>>;
}
ChatOrchestrator

For advanced chat management with sessions:

impl ChatOrchestrator {
    pub fn new(retriever: Arc<dyn Retriever>, model: Arc<dyn LanguageModel>, config: Option<ChatConfig>) -> Self;
    pub async fn create_session(&self) -> String;
    pub async fn chat(&self, session_id: &str, message: &str) -> Result<ModelResponse>;
    pub async fn query(&self, question: &str) -> Result<ModelResponse>;
    pub async fn get_session(&self, session_id: &str) -> Option<ChatSession>;
    pub async fn delete_session(&self, session_id: &str) -> bool;
}

Advanced Usage

Custom Retriever

use async_trait::async_trait;

struct CustomRetriever {
    // Your custom fields
}

#[async_trait]
impl Retriever for CustomRetriever {
    async fn retrieve(&self, query: &str) -> Result<Vec<SearchResult>> {
        // Your custom retrieval logic
    }
    
    async fn retrieve_with_config(&self, query: &str, config: &RetrievalConfig) -> Result<Vec<SearchResult>> {
        // Your custom retrieval with config
    }
}

Custom Vector Store

use async_trait::async_trait;

struct CustomVectorStore {
    // Your custom storage implementation
}

#[async_trait]
impl VectorStore for CustomVectorStore {
    async fn add_document(&mut self, document: Document) -> Result<()> {
        // Your storage logic
    }
    
    async fn search(&self, query_embedding: &Embedding, top_k: usize) -> Result<Vec<SearchResult>> {
        // Your search logic
    }
    
    // ... implement other required methods
}

Batch Processing

use genragrs::prelude::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut rag = RagSystem::default().await?;
    
    // Process multiple files
    let files = vec!["doc1.txt", "doc2.md", "doc3.py"];
    for file in files {
        if file.ends_with(".md") {
            rag.add_markdown_file(file).await?;
        } else {
            rag.add_text_file(file).await?;
        }
    }
    
    // Batch queries
    let questions = vec![
        "What is the main topic?",
        "How does this work?",
        "What are the key features?",
    ];
    
    for question in questions {
        let answer = rag.query(question).await?;
        println!("Q: {}\nA: {}\n", question, answer);
    }
    
    Ok(())
}

CLI Reference

Commands

# Basic usage - process files and start chat
rag-cli --rag-files <PATH> [OPTIONS]

# Process only without chat
rag-cli --rag-files <PATH> process-only

# Show configuration
rag-cli config --show

Options

Options:
  -r, --rag-files <PATH>         RAG files or directories to process
  -R, --recursive                Recursively process directories
  -e, --extensions <LIST>        File extensions to include (comma-separated)
      --ollama-url <URL>         Ollama base URL [default: http://localhost:11434]
      --embed-model <MODEL>      Embedding model [default: nomic-embed-text]
      --chat-model <MODEL>       Chat model [default: qwen3:0.6b]
  -v, --verbose                  Enable verbose logging
  -h, --help                     Print help

Examples

# Process current directory with common file types
rag-cli --rag-files . --recursive

# Process specific files
rag-cli --rag-files README.md --rag-files src/lib.rs

# Process with custom extensions
rag-cli --rag-files ./docs --recursive --extensions md,txt,rst

# Use different models
rag-cli --rag-files ./code --chat-model llama2:7b --embed-model nomic-embed-text

# Verbose logging
rag-cli --rag-files ./docs --verbose

Chat Commands

When in interactive chat mode:

  • exit or quit - Exit chat
  • help - Show help
  • context <question> - Show source documents for a question
  • Any other text - Send message to assistant

Supported File Types

Extension Description Processor Features
.txt Plain text TextFileLoader Basic chunking
.md Markdown MarkdownLoader Header extraction, metadata
.py Python TextFileLoader Syntax-aware chunking
.rs Rust TextFileLoader Syntax-aware chunking
.js JavaScript TextFileLoader Syntax-aware chunking
.ts TypeScript TextFileLoader Syntax-aware chunking
.java Java TextFileLoader Syntax-aware chunking
.cpp, .c, .h C/C++ TextFileLoader Syntax-aware chunking

Performance Tuning

Chunking Strategy

// For code files - smaller chunks for precise retrieval
let config = RagConfig::new().with_chunk_settings(512, 100);

// For documentation - larger chunks for context
let config = RagConfig::new().with_chunk_settings(1500, 300);

// For mixed content - balanced approach
let config = RagConfig::new().with_chunk_settings(1000, 200); // default

Retrieval Tuning

let retrieval_config = RetrievalConfig {
    top_k: 10,                    // More documents for complex queries
    score_threshold: Some(0.7),   // Filter low-relevance results
    rerank: true,                 // Enable reranking for better quality
};

Model Selection

Fast Models (Low Resource)

  • qwen3:0.6b (default) - Fast, good for simple Q&A
  • phi:2.7b - Balanced speed and quality

Quality Models (More Resources)

  • llama2:7b - High quality responses
  • mistral:7b - Good for technical content
  • codellama:7b - Specialized for code

Embedding Models

  • nomic-embed-text (default) - General purpose
  • all-minilm - Fast, smaller embeddings

Error Handling

use genragrs::prelude::*;

match rag.query("test").await {
    Ok(response) => println!("Response: {}", response),
    Err(RagError::Http(e)) => eprintln!("Network error: {}", e),
    Err(RagError::Embedding(e)) => eprintln!("Embedding error: {}", e),
    Err(RagError::Model(e)) => eprintln!("Model error: {}", e),
    Err(e) => eprintln!("Other error: {}", e),
}

Troubleshooting

Common Issues

"Failed to initialize RAG system"

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

"Embedding error"

# Pull the embedding model
ollama pull nomic-embed-text

# Check available models
ollama list

"Model error"

# Pull the chat model
ollama pull qwen3:0.6b

# Use a different model
rag-cli --chat-model llama2:7b --rag-files ./docs

Low Quality Responses

  • Try enabling reranking: use --verbose to see retrieval scores
  • Adjust chunk size for your content type
  • Use a higher quality model like llama2:7b
  • Increase top_k for complex queries

Debug Mode

# Enable debug logging
RUST_LOG=debug cargo run --example rag_cli -- --rag-files ./docs --verbose

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/new-feature
  3. Make your changes and add tests
  4. Run tests: cargo test
  5. Run clippy: cargo clippy
  6. Format code: cargo fmt
  7. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Ollama for local LLM inference
  • Tokio for async runtime
  • The Rust community for excellent crates and tools

About

A modular Rust library for Retrieval-Augmented Generation (RAG) with Ollama integration.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages