Skip to content

Mansoryq/Capestone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

License: MIT

Intellecta RAG System

A production-grade, offline-capable Retrieval-Augmented Generation (RAG) system designed for secure, on-premise document analysis and intelligent question-answering. Built entirely on open-source technologies, enabling organizations to deploy AI-powered document intelligence without cloud dependencies.


πŸ“‹ Table of Contents

  1. Executive Summary
  2. System Architecture
  3. Technology Stack
  4. Features
  5. Installation
  6. Running the Application
  7. Document Ingestion Pipeline
  8. Chunking Strategy
  9. Embedding Generation
  10. Vector Storage & Retrieval
  11. LLM Reasoning & Response Generation
  12. Security Framework
  13. Multi-Language Support
  14. Evaluation Metrics
  15. API Documentation
  16. Performance Optimization
  17. Project Structure
  18. Troubleshooting

Executive Summary

Intellecta is a production-grade, offline-capable Retrieval-Augmented Generation (RAG) system designed for secure, on-premise document analysis and intelligent question-answering. Built entirely on open-source technologies, it enables organizations to deploy AI-powered document intelligence without cloud dependencies, ensuring data sovereignty and compliance with air-gapped security requirements.

Key Capabilities

Capability Description
Document Intelligence Process PDF, DOCX, CSV, Excel, and more
Semantic Search Find relevant information using AI embeddings
AI-Powered Q&A Get intelligent answers grounded in your documents
Security Controls 5-level security clearance system
Multi-Language English, Korean, Vietnamese support
Offline Operation No cloud dependencies, air-gapped ready

System Architecture

High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              USER INTERFACE                                 β”‚
β”‚                         (React + TypeScript + Vite)                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Dashboard  β”‚  β”‚Query/Responseβ”‚ β”‚Doc Ingestionβ”‚  β”‚  History & Logs     β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              REST API LAYER                                 β”‚
β”‚                            (FastAPI + Python)                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ /query      β”‚  β”‚ /ingest     β”‚  β”‚ /documents  β”‚  β”‚/security/auto-detectβ”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό                 β–Ό                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   INGESTION PIPELINE  β”‚ β”‚  RAG ORCHESTRATOR β”‚ β”‚   SECURITY FRAMEWORK      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Document Parser β”‚  β”‚ β”‚ β”‚Query Embeddingβ”‚ β”‚ β”‚  β”‚ Pattern Detection   β”‚  β”‚
β”‚  β”‚ (PDF,CSV,DOCX)  β”‚  β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚  β”‚ (SSN, Salary, etc.) β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ β”‚        β”‚          β”‚ β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚          β”‚            β”‚ β”‚        β–Ό          β”‚ β”‚            β”‚              β”‚
β”‚          β–Ό            β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚            β–Ό              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚ β”‚ β”‚Vector Search  β”‚ β”‚ β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Text Chunking   β”‚  β”‚ β”‚ β”‚  (pgvector)   β”‚ β”‚ β”‚  β”‚ Clearance Levels    β”‚  β”‚
β”‚  β”‚ (512 tokens)    β”‚  β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚  β”‚ (PUBLICβ†’TOP_SECRET) β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ β”‚        β”‚          β”‚ β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚          β”‚            β”‚ β”‚        β–Ό          β”‚ β”‚                           β”‚
β”‚          β–Ό            β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚ β”‚ β”‚Context Build  β”‚ β”‚
β”‚  β”‚ E5 Embedding    β”‚  β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚  β”‚ (1024-dim)      β”‚  β”‚ β”‚        β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ β”‚        β–Ό          β”‚
β”‚          β”‚            β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚          β–Ό            β”‚ β”‚ β”‚ LLM Reasoning β”‚ β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚ β”‚ β”‚ (LLaMA 3 8B)  β”‚ β”‚
β”‚  β”‚ Vector Storage  β”‚  β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚  β”‚   (pgvector)    β”‚  β”‚ β”‚        β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ β”‚        β–Ό          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                          β”‚ β”‚ Translation   β”‚ β”‚
                          β”‚ β”‚ (Mistral 7B)  β”‚ β”‚
                          β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              DATA LAYER                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  PostgreSQL +       β”‚  β”‚  Document Registry  β”‚  β”‚   Query History     β”‚  β”‚
β”‚  β”‚  pgvector           β”‚  β”‚  (JSON)             β”‚  β”‚   (JSON)            β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           LLM INFERENCE LAYER                               β”‚
β”‚                              (Ollama Runtime)                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  LLaMA 3 8B (4.6 GB)        β”‚  β”‚  Mistral 7B (4.1 GB)                β”‚   β”‚
β”‚  β”‚  - Reasoning                β”‚  β”‚  - Translation (Quality Mode)       β”‚   β”‚
β”‚  β”‚  - Answer Generation        β”‚  β”‚  - Refinement                       β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

User Query β†’ Embedding β†’ Vector Search β†’ Context Assembly β†’ LLM Reasoning β†’ Response
     β”‚            β”‚            β”‚                β”‚                β”‚            β”‚
     └── Security Check ───────┴──── Chunk Filtering ────────────┴── Translation

Technology Stack

Backend Technologies

Component Technology Version Purpose
Framework FastAPI 0.104+ REST API, async support
Language Python 3.11+ Core programming
Database PostgreSQL 15+ Relational storage
Vector DB pgvector 0.5+ Similarity search
LLM Runtime Ollama 0.1+ Local model inference
Embeddings sentence-transformers 2.2+ Text embeddings

Frontend Technologies

Component Technology Version Purpose
Framework React 18+ UI components
Build Tool Vite 5+ Fast development
Language TypeScript 5+ Type safety
Styling Tailwind CSS 3+ Utility-first CSS
Components shadcn/ui latest UI component library
Charts Recharts 2+ Data visualization

AI/ML Models

Model Parameters Size License Purpose
LLaMA 3 8B 8 Billion 4.6 GB Meta Open Reasoning, Generation
Mistral 7B 7 Billion 4.1 GB Apache 2.0 Translation, Refinement
E5-large-v2 335 Million 1.3 GB MIT Text Embeddings

Features

πŸ”„ Dual LLM Mode Switcher

Toggle between Fast and Quality modes directly from the UI:

  • ⚑ Fast Mode: Uses LLaMA 3 8B for all tasks (~30-60s per query)
  • πŸ”¬ Quality Mode: Uses LLaMA 3 8B + Mistral 7B for better translations (~40-90s per query)

πŸ” Dual Security Checking

Security is enforced at two levels:

  • Query Analysis: Scans query text for sensitive keywords
  • Document Analysis: Scans retrieved content for sensitive patterns
  • Effective Level: Uses the HIGHER of query or document security

🌐 Multi-Language Support

  • English πŸ‡ΊπŸ‡Έ - Native support
  • Korean πŸ‡°πŸ‡· - Full translation pipeline
  • Vietnamese πŸ‡»πŸ‡³ - Full translation pipeline

πŸ“Š Real-Time Metrics

  • Accuracy, Precision, Efficiency, Throughput scores
  • High-quality chunk ratio
  • Retrieval and generation timing

πŸ“„ Document Selection

  • Filter queries to specific documents
  • Multi-select document picker
  • Auto-detect security level from content

πŸ“œ Query History

  • Persistent history with timestamps
  • Replay previous queries
  • Delete individual entries

πŸ“ˆ Dashboard Analytics

  • System status monitoring
  • Performance charts
  • Document statistics
  • Downloadable reports (Markdown format)

Installation

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • PostgreSQL 15+ with pgvector extension
  • Ollama for local LLM inference

1. Clone Repository

git clone https://github.com/Mansoryq/Capestone.git
cd Capestone

2. Install Ollama and Models

# Install Ollama (macOS)
brew install ollama

# Start Ollama service
ollama serve

# Pull required models (in another terminal)
ollama pull llama3:8b
ollama pull mistral:latest

3. Setup PostgreSQL with pgvector

# Using Docker (recommended)
docker run -d --name pgvector \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=energy_ai \
  -p 5432:5432 \
  ankane/pgvector

# Create extension
psql -h localhost -U postgres -d energy_ai -c "CREATE EXTENSION IF NOT EXISTS vector;"

4. Setup Backend

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

5. Setup Frontend

cd Global_Capstone_Frontend

# Install dependencies
npm install
# or
bun install

Running the Application

Option 1: Fast Mode (Recommended for Development)

# Terminal 1: Backend
cd backend
./start_fast.sh
# or manually:
# FAST_MODE=true python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2: Frontend
cd Global_Capstone_Frontend
npm run dev -- --port 8082

Option 2: Quality Mode (Better Translations)

# Terminal 1: Backend
cd backend
./start_quality.sh

# Terminal 2: Frontend
cd Global_Capstone_Frontend
npm run dev -- --port 8082

Access Points


Document Ingestion Pipeline

Supported File Formats

Format Extension Parser Features
PDF .pdf PyMuPDF (fitz) Text, tables, images, OCR
Word .docx python-docx Text, tables, formatting
Excel .xlsx openpyxl Sheets, formulas, data
CSV .csv pandas Structured data
Text .txt native Plain text
Markdown .md native Formatted text
JSON .json native Structured data

Ingestion Process

  1. File Validation - Check file extension and size
  2. Content Extraction - Parse text from document
  3. Text Preprocessing - Normalize and clean text
  4. Chunking - Split into 512-token segments
  5. Embedding Generation - Create 1024-dim vectors
  6. Vector Storage - Store in PostgreSQL with pgvector
  7. Metadata Registration - Track document info

Chunking Strategy

Configuration

Parameter Value Rationale
Chunk Size 512 tokens Optimal for E5 model context
Chunk Overlap 50 tokens Preserves context at boundaries
Min Chunk Size 100 tokens Avoids fragmentary chunks
Separator Sentence boundaries Semantic coherence

Quality Metrics

Metric Target Measurement
Avg Chunk Size 450-512 tokens Mean token count
Size Variance < 20% Standard deviation
Semantic Coherence > 0.7 Sentence boundary alignment

Embedding Generation

Model: intfloat/e5-large-v2

Attribute Value
Dimensions 1024
Max Sequence 512 tokens
Parameters 335M
License MIT
Benchmark (MTEB) 63.3% avg

E5 Prefix Convention

# For documents/passages
prefixed_text = f"passage: {text}"

# For queries
prefixed_query = f"query: {text}"

Vector Storage & Retrieval

PostgreSQL + pgvector

-- Documents table with vector column
CREATE TABLE public.documents (
    id SERIAL PRIMARY KEY,
    text TEXT NOT NULL,
    embedding vector(1024),
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- IVFFlat index for fast similarity search
CREATE INDEX ON public.documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Retrieval Quality Thresholds

Quality Tier Distance Range Classification
Excellent < 0.15 Highly relevant
Good 0.15 - 0.25 Relevant
Acceptable 0.25 - 0.35 Marginally relevant
Filtered > 0.35 Excluded

LLM Reasoning & Response Generation

Dual-Mode Architecture

Mode Reasoning Translation Avg Latency
⚑ Fast LLaMA 3 8B LLaMA 3 8B 30-60s
πŸ”¬ Quality LLaMA 3 8B Mistral 7B 40-90s

RAG Pipeline Steps

  1. Security Analysis - Check query and document sensitivity
  2. Vector Retrieval - Find relevant chunks
  3. Chunk Filtering - Apply security and quality filters
  4. Context Assembly - Build prompt with sources
  5. LLM Reasoning - Generate answer
  6. Translation - Convert to target language (if needed)
  7. Metrics Calculation - Compute quality scores

Security Framework

Security Levels

Level Value Description Example Content
PUBLIC 1 Open access General documentation
INTERNAL 2 Organization only Internal processes
CONFIDENTIAL 3 Restricted Financial data
RESTRICTED 4 Highly restricted Personal data (SSN)
TOP_SECRET 5 Maximum security Critical infrastructure

Dual Security Checking

Query Analysis β†’ Document Analysis β†’ Effective Level = MAX(query, document)

If user clearance < effective level β†’ Access Denied


Multi-Language Support

Supported Languages

Language Code Translation Response
English en Not needed Native
Korean ko Query β†’ EN, Response β†’ KO Full support
Vietnamese vi Query β†’ EN, Response β†’ VI Full support

Evaluation Metrics

Retrieval Metrics

Metric Formula Target Description
Accuracy 100 - (avg_distance Γ— 40) > 90% How close chunks are to query
Precision 85 + weighted_quality > 90% Quality tier distribution
Efficiency 100 - (time/3.0 Γ— 10) > 90% Retrieval speed
Throughput 90 + (chunks/sec Γ— 2) > 90% Processing rate

Latency Breakdown

Stage Target
Query Embedding < 100ms
Vector Search < 500ms
Security Check < 50ms
LLM Reasoning < 60s
Translation < 30s

API Documentation

Endpoints Overview

Method Endpoint Description
GET /status System health status
GET /config System configuration
POST /query Submit RAG query
POST /ingest Upload document
GET /documents List all documents
DELETE /documents/{id} Delete document
GET /query/history Get query history
POST /security/auto-detect Detect document security
GET /stats Data statistics

Query Endpoint

Request:

POST /query
{
  "query": "What is the power plant capacity?",
  "language": "en",
  "security_clearance": "CONFIDENTIAL",
  "document_ids": ["doc_123"],
  "fast_mode": true
}

Response:

{
  "answer": "The power plant has a capacity of 500 MW...",
  "sources": ["power_plant_data.pdf"],
  "retrieval_time_ms": 245,
  "generation_time_ms": 32000,
  "fast_mode": true,
  "model_used": "llama3:8b",
  "security": {
    "level": "CONFIDENTIAL",
    "access_allowed": true
  },
  "chunks_used": 5,
  "metrics": {
    "accuracy": 92.5,
    "precision": 95.0
  }
}

Performance Optimization

Model Warmup

Models are pre-loaded at startup for faster first query:

def warmup_models():
    """Pre-load models at startup"""
    requests.post("http://localhost:11434/api/generate", json={
        "model": "llama3:8b",
        "prompt": "Hello",
        "options": {"num_predict": 1}
    })

Database Indexes

CREATE INDEX idx_documents_doc_id ON public.documents ((metadata->>'doc_id'));
CREATE INDEX idx_documents_source ON public.documents ((metadata->>'source'));

Project Structure

capestone/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py                 # FastAPI application
β”‚   β”œβ”€β”€ mistral_rag.py          # RAG orchestrator
β”‚   β”œβ”€β”€ document_ingest.py      # Document processing
β”‚   β”œβ”€β”€ embed_e5.py             # Embedding generation
β”‚   β”œβ”€β”€ retrieve_pgvector.py    # Vector retrieval
β”‚   β”œβ”€β”€ security_mapping.py     # Security framework
β”‚   β”œβ”€β”€ requirements.txt        # Python dependencies
β”‚   β”œβ”€β”€ start_fast.sh           # Fast mode startup
β”‚   β”œβ”€β”€ start_quality.sh        # Quality mode startup
β”‚   └── data/
β”‚       β”œβ”€β”€ documents_registry.json
β”‚       β”œβ”€β”€ query_history.json
β”‚       └── uploads/
β”‚
β”œβ”€β”€ Global_Capstone_Frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”‚   β”œβ”€β”€ Dashboard.tsx
β”‚   β”‚   β”‚   β”œβ”€β”€ QueryResponse.tsx
β”‚   β”‚   β”‚   └── DocumentIngestion.tsx
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   └── api.ts
β”‚   β”‚   └── lib/
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.ts
β”‚
β”œβ”€β”€ README.md
β”œβ”€β”€ FEATURES.md
└── COMPLIANCE.md

Troubleshooting

Issue Solution
"No relevant information found" Lower max_distance threshold, check document ingestion
Slow response times Use Fast mode, reduce top_k, check CPU load
Security access denied Increase user clearance, check document security
Model not responding Restart Ollama, check model is pulled
Database connection error Verify PostgreSQL is running
Frontend not loading Check if backend is running on port 8000

Common Commands

# Check Ollama models
ollama list

# Check PostgreSQL connection
psql -h localhost -U postgres -d energy_ai -c "SELECT COUNT(*) FROM documents;"

# Restart backend
cd backend && pkill -f "uvicorn main:app" && ./start_fast.sh

# Clear query history
curl -X DELETE http://localhost:8000/query/history

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is built entirely on open-source technologies. See COMPLIANCE.md for full license details.


πŸ‘₯ Authors

  • Abylay Turganbekov (Co-Leader)
  • Harishik Dev Singh (Team Leader)
  • Aikanym Baisalova
  • Zhangali Otegaliev
  • Alvin.K
  • 였민혁

Document Version: 1.0.0 Last Updated: January 2026


Note: This system is designed for CPU inference. For faster performance, consider using a GPU with CUDA-enabled Ollama installation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors