A modular Rust library for Retrieval-Augmented Generation (RAG) with Ollama integration
GenRAGrs is a production-ready, modular RAG library designed for building intelligent applications that can understand and query your documents. It combines vector similarity search with large language models to provide contextually accurate responses based on your data.
- 🚀 Ollama Integration: Seamless integration with local Ollama models
- 🧩 Modular Architecture: Pluggable components for embeddings, storage, retrieval, and chat
- 📚 Multi-format Support: Text files, Markdown, and code files
- 💬 Smart Chunking: Intelligent document splitting with configurable overlap
- 🔍 Advanced Retrieval: Semantic search with optional reranking and hybrid retrieval
- 🎯 Flexible Prompting: Customizable prompt templates for different use cases
- ⚡ Async Performance: Built with Tokio for high-performance async operations
- 🛠️ CLI Tool: Ready-to-use command-line interface
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull required models
ollama pull nomic-embed-text # For embeddings
ollama pull qwen3:0.6b # For chat (default model)
# Clone and build GenRAGrs
git clone <your-repo-url>
cd genragrs
cargo build --release# Process documents and start interactive chat
cargo run --example rag_cli -- --rag-files ./README.md --rag-files ./src --recursive
# With specific file types
cargo run --example rag_cli -- --rag-files ./docs --recursive --extensions md,txt,rs
# Using different models
cargo run --example rag_cli -- --rag-files ./docs --chat-model llama2:7b --embed-model nomic-embed-text
# Show help
cargo run --example rag_cli -- --help🚀 Welcome to RAG Chat!
💬 Ask questions about your documents
📋 Commands: 'exit'/'quit' to exit, 'help' for help, 'context <question>' to see sources
======================================================================
💬 You: What is GenRAGrs?
🤖 Assistant: GenRAGrs is a modular Rust library for Retrieval-Augmented Generation (RAG) with Ollama integration. It's designed to be production-ready and allows you to build intelligent applications that can understand and query your documents...
💬 You: context What is GenRAGrs?
📚 Context for: What is GenRAGrs?
==================================================
📄 Source 1: README.md
🎯 Score: 0.923
📝 Content: GenRAGrs is a modular Rust library for Retrieval-Augmented Generation (RAG) with Ollama integration. It combines vector similarity search with large language models...
------------------------------
💬 You: exit
👋 Goodbye!
GenRAGrs follows a modular design with these key components:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Documents │───▶│ Embeddings │───▶│ Vector Store │
│ │ │ │ │ │
│ • TextLoader │ │ • OllamaEmbedder│ │ • InMemoryStore │
│ • MarkdownLoader│ │ • Batch Support │ │ • Cosine Sim │
│ • TextSplitter │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Chat │◀───│ Retriever │◀───│ │
│ │ │ │ │ │
│ • SimpleChat │ │ • SimpleRetriever│ │ │
│ • Orchestrator │ │ • HybridRetriever│ │ │
│ • Sessions │ │ • Reranking │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ Models │ │ Prompts │
│ │ │ │
│ • OllamaModel │ │ • Templates │
│ • ModelConfig │ │ • QA, Code, etc │
│ • MockModel │ │ • Custom vars │
└─────────────────┘ └─────────────────┘
use genragrs::prelude::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize with default configuration
let mut rag = RagSystem::default().await?;
// Add documents
rag.add_text_document("Rust is a systems programming language.").await?;
rag.add_text_file("path/to/document.txt").await?;
rag.add_markdown_file("README.md").await?;
// Query the system
let response = rag.query("What is Rust?").await?;
println!("Answer: {}", response);
// Start a conversation
let response = rag.chat("Tell me more about systems programming").await?;
println!("Response: {}", response);
Ok(())
}use genragrs::prelude::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = RagConfig::new()
.with_ollama_url("http://localhost:11434".to_string())
.with_chat_model("llama2:7b".to_string())
.with_embedding_model("nomic-embed-text".to_string())
.with_chunk_settings(512, 100)
.with_prompt_template(PromptTemplates::code_assistant_template());
let mut rag = RagSystem::new(config).await?;
// Rest of your code...
Ok(())
}The main entry point for the RAG functionality.
impl RagSystem {
// Construction
pub async fn new(config: RagConfig) -> Result<Self>;
pub async fn default() -> Result<Self>;
// Document management
pub async fn add_text_document(&mut self, content: &str) -> Result<()>;
pub async fn add_text_file(&mut self, file_path: &str) -> Result<()>;
pub async fn add_markdown_file(&mut self, file_path: &str) -> Result<()>;
pub async fn clear(&mut self) -> Result<()>;
pub async fn has_documents(&self) -> bool;
// Querying
pub async fn query(&self, question: &str) -> Result<String>;
pub async fn chat(&self, message: &str) -> Result<String>;
pub async fn get_context(&self, query: &str) -> Result<Vec<SearchResult>>;
}Configuration for the RAG system.
#[derive(Debug, Clone)]
pub struct RagConfig {
pub ollama_base_url: String,
pub embedding_model: String,
pub chat_model: String,
pub chunk_size: usize,
pub chunk_overlap: usize,
pub retrieval_config: RetrievalConfig,
pub model_config: ModelConfig,
pub prompt_template: PromptTemplate,
}
impl RagConfig {
pub fn new() -> Self;
pub fn with_ollama_url(self, url: String) -> Self;
pub fn with_embedding_model(self, model: String) -> Self;
pub fn with_chat_model(self, model: String) -> Self;
pub fn with_chunk_settings(self, size: usize, overlap: usize) -> Self;
pub fn with_prompt_template(self, template: PromptTemplate) -> Self;
pub fn with_retrieval_config(self, config: RetrievalConfig) -> Self;
}Intelligent text chunking with overlap support.
impl TextSplitter {
pub fn new(chunk_size: usize, chunk_overlap: usize) -> Self;
pub fn with_separators(self, separators: Vec<String>) -> Self;
pub fn split_text(&self, text: &str) -> Vec<String>;
}// Text files
let loader = TextFileLoader::new(Some(text_splitter));
let documents = loader.load("path/to/file.txt").await?;
// Markdown files with header extraction
let loader = MarkdownLoader::new(Some(text_splitter));
let documents = loader.load("path/to/README.md").await?;impl OllamaEmbedder {
pub fn new(base_url: Option<String>, model: Option<String>) -> Self;
pub fn with_model(self, model: String) -> Self;
}
#[async_trait]
impl Embedder for OllamaEmbedder {
async fn embed(&self, text: &str) -> Result<Embedding>;
async fn embed_batch(&self, texts: &[String]) -> Result<Vec<Embedding>>;
}impl InMemoryVectorStore {
pub fn new() -> Self;
pub fn is_empty(&self) -> bool;
pub fn len(&self) -> usize;
}
#[async_trait]
impl VectorStore for InMemoryVectorStore {
async fn add_document(&mut self, document: Document) -> Result<()>;
async fn add_documents(&mut self, documents: Vec<Document>) -> Result<()>;
async fn search(&self, query_embedding: &Embedding, top_k: usize) -> Result<Vec<SearchResult>>;
async fn get_document(&self, id: &str) -> Result<Option<Document>>;
async fn delete_document(&mut self, id: &str) -> Result<bool>;
async fn clear(&mut self) -> Result<()>;
}#[derive(Debug, Clone)]
pub struct RetrievalConfig {
pub top_k: usize, // Number of documents to retrieve (default: 5)
pub score_threshold: Option<f32>, // Minimum similarity score
pub rerank: bool, // Enable reranking (default: false)
}// Simple semantic retrieval
let retriever = SimpleRetriever::new(embedder, vector_store, Some(config));
// Hybrid retrieval (semantic + keyword)
let hybrid_retriever = HybridRetriever::new(semantic_retriever, Some(0.3));// Q&A template (default)
let template = PromptTemplates::qa_template();
// Code assistant
let template = PromptTemplates::code_assistant_template();
// Research assistant
let template = PromptTemplates::research_template();
// Summarization
let template = PromptTemplates::summarization_template();let template = PromptTemplate::new()
.with_system_prompt("You are a helpful assistant.".to_string())
.with_user_template("Context: {context}\n\nQuestion: {question}\n\nAnswer:".to_string())
.with_context_template("Source: {source}\n{content}".to_string());let builder = PromptBuilder::new(template)
.set_variable("domain".to_string(), "medical".to_string())
.set_variable("style".to_string(), "formal".to_string());
let messages = builder.build("What is diabetes?", &search_results)?;impl OllamaModel {
pub fn new(base_url: Option<String>, default_config: Option<ModelConfig>) -> Self;
pub fn with_model(self, model_name: String) -> Self;
}
#[async_trait]
impl LanguageModel for OllamaModel {
async fn generate(&self, messages: ChatMessages) -> Result<ModelResponse>;
async fn generate_with_config(&self, messages: ChatMessages, config: &ModelConfig) -> Result<ModelResponse>;
}#[derive(Debug, Clone)]
pub struct ModelConfig {
pub model_name: String,
pub temperature: Option<f32>,
pub max_tokens: Option<u32>,
pub top_p: Option<f32>,
pub stream: bool,
}
impl ModelConfig {
pub fn new(model_name: String) -> Self;
pub fn with_temperature(self, temperature: f32) -> Self;
pub fn with_max_tokens(self, max_tokens: u32) -> Self;
pub fn with_streaming(self, stream: bool) -> Self;
}impl SimpleChat {
pub fn new(retriever: Arc<dyn Retriever>, model: Arc<dyn LanguageModel>, config: Option<ChatConfig>) -> Self;
pub async fn ask(&self, question: &str) -> Result<String>;
pub async fn chat(&self, message: &str) -> Result<String>;
pub async fn get_context(&self, query: &str) -> Result<Vec<SearchResult>>;
}For advanced chat management with sessions:
impl ChatOrchestrator {
pub fn new(retriever: Arc<dyn Retriever>, model: Arc<dyn LanguageModel>, config: Option<ChatConfig>) -> Self;
pub async fn create_session(&self) -> String;
pub async fn chat(&self, session_id: &str, message: &str) -> Result<ModelResponse>;
pub async fn query(&self, question: &str) -> Result<ModelResponse>;
pub async fn get_session(&self, session_id: &str) -> Option<ChatSession>;
pub async fn delete_session(&self, session_id: &str) -> bool;
}use async_trait::async_trait;
struct CustomRetriever {
// Your custom fields
}
#[async_trait]
impl Retriever for CustomRetriever {
async fn retrieve(&self, query: &str) -> Result<Vec<SearchResult>> {
// Your custom retrieval logic
}
async fn retrieve_with_config(&self, query: &str, config: &RetrievalConfig) -> Result<Vec<SearchResult>> {
// Your custom retrieval with config
}
}use async_trait::async_trait;
struct CustomVectorStore {
// Your custom storage implementation
}
#[async_trait]
impl VectorStore for CustomVectorStore {
async fn add_document(&mut self, document: Document) -> Result<()> {
// Your storage logic
}
async fn search(&self, query_embedding: &Embedding, top_k: usize) -> Result<Vec<SearchResult>> {
// Your search logic
}
// ... implement other required methods
}use genragrs::prelude::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut rag = RagSystem::default().await?;
// Process multiple files
let files = vec!["doc1.txt", "doc2.md", "doc3.py"];
for file in files {
if file.ends_with(".md") {
rag.add_markdown_file(file).await?;
} else {
rag.add_text_file(file).await?;
}
}
// Batch queries
let questions = vec![
"What is the main topic?",
"How does this work?",
"What are the key features?",
];
for question in questions {
let answer = rag.query(question).await?;
println!("Q: {}\nA: {}\n", question, answer);
}
Ok(())
}# Basic usage - process files and start chat
rag-cli --rag-files <PATH> [OPTIONS]
# Process only without chat
rag-cli --rag-files <PATH> process-only
# Show configuration
rag-cli config --showOptions:
-r, --rag-files <PATH> RAG files or directories to process
-R, --recursive Recursively process directories
-e, --extensions <LIST> File extensions to include (comma-separated)
--ollama-url <URL> Ollama base URL [default: http://localhost:11434]
--embed-model <MODEL> Embedding model [default: nomic-embed-text]
--chat-model <MODEL> Chat model [default: qwen3:0.6b]
-v, --verbose Enable verbose logging
-h, --help Print help# Process current directory with common file types
rag-cli --rag-files . --recursive
# Process specific files
rag-cli --rag-files README.md --rag-files src/lib.rs
# Process with custom extensions
rag-cli --rag-files ./docs --recursive --extensions md,txt,rst
# Use different models
rag-cli --rag-files ./code --chat-model llama2:7b --embed-model nomic-embed-text
# Verbose logging
rag-cli --rag-files ./docs --verboseWhen in interactive chat mode:
exitorquit- Exit chathelp- Show helpcontext <question>- Show source documents for a question- Any other text - Send message to assistant
| Extension | Description | Processor | Features |
|---|---|---|---|
.txt |
Plain text | TextFileLoader | Basic chunking |
.md |
Markdown | MarkdownLoader | Header extraction, metadata |
.py |
Python | TextFileLoader | Syntax-aware chunking |
.rs |
Rust | TextFileLoader | Syntax-aware chunking |
.js |
JavaScript | TextFileLoader | Syntax-aware chunking |
.ts |
TypeScript | TextFileLoader | Syntax-aware chunking |
.java |
Java | TextFileLoader | Syntax-aware chunking |
.cpp, .c, .h |
C/C++ | TextFileLoader | Syntax-aware chunking |
// For code files - smaller chunks for precise retrieval
let config = RagConfig::new().with_chunk_settings(512, 100);
// For documentation - larger chunks for context
let config = RagConfig::new().with_chunk_settings(1500, 300);
// For mixed content - balanced approach
let config = RagConfig::new().with_chunk_settings(1000, 200); // defaultlet retrieval_config = RetrievalConfig {
top_k: 10, // More documents for complex queries
score_threshold: Some(0.7), // Filter low-relevance results
rerank: true, // Enable reranking for better quality
};Fast Models (Low Resource)
qwen3:0.6b(default) - Fast, good for simple Q&Aphi:2.7b- Balanced speed and quality
Quality Models (More Resources)
llama2:7b- High quality responsesmistral:7b- Good for technical contentcodellama:7b- Specialized for code
Embedding Models
nomic-embed-text(default) - General purposeall-minilm- Fast, smaller embeddings
use genragrs::prelude::*;
match rag.query("test").await {
Ok(response) => println!("Response: {}", response),
Err(RagError::Http(e)) => eprintln!("Network error: {}", e),
Err(RagError::Embedding(e)) => eprintln!("Embedding error: {}", e),
Err(RagError::Model(e)) => eprintln!("Model error: {}", e),
Err(e) => eprintln!("Other error: {}", e),
}"Failed to initialize RAG system"
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Start Ollama if needed
ollama serve"Embedding error"
# Pull the embedding model
ollama pull nomic-embed-text
# Check available models
ollama list"Model error"
# Pull the chat model
ollama pull qwen3:0.6b
# Use a different model
rag-cli --chat-model llama2:7b --rag-files ./docsLow Quality Responses
- Try enabling reranking: use
--verboseto see retrieval scores - Adjust chunk size for your content type
- Use a higher quality model like
llama2:7b - Increase
top_kfor complex queries
# Enable debug logging
RUST_LOG=debug cargo run --example rag_cli -- --rag-files ./docs --verbose- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature - Make your changes and add tests
- Run tests:
cargo test - Run clippy:
cargo clippy - Format code:
cargo fmt - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.