A high-performance, production-ready document search and question-answering system powered by Retrieval-Augmented Generation (RAG). Featuring multimodal vision processing, source citations, conversation history, and a modern dashboard UI.
This project implements a multimodal PDF Question-Answering system using Retrieval-Augmented Generation (RAG). Users can upload documents (PDF, TXT, MD, HTML), which are instantly processed, chunked, and indexed. In addition to extracting text, the engine automatically detects and extracts images (charts, tables, diagrams) from PDF pages, generates semantic descriptions using Groq's Llama 4 Scout Vision model, and indexes them alongside textual data.
When you ask a question, the system retrieves the most relevant textual and visual context to synthesize an accurate answer with exact page-level citations.
- πΌοΈ Multimodal Vision Indexing β Automatically extracts embedded images from PDFs, generates descriptive summaries using
meta-llama/llama-4-scout-17b-16e-instructon Groq, and index-matches them. You can search for data inside charts, tables, and diagrams! - ποΈ Multiple Document Support β Upload and manage multiple files simultaneously. Search across your entire catalog or query specific documents.
- π― Verifiable Citations β Every response includes exact references (source document name, page number, and content snippet) to eliminate hallucinations.
- π¬ Premium Glassmorphic Interface β An interactive, responsive web dashboard with a clean sidebar, file dropzone, audio transcriber interface, and modal context viewer.
- β‘ API-Based Hybrid Embeddings β Fast, lightweight deployment under 200MB using the HuggingFace Inference API (
all-MiniLM-L6-v2), avoiding heavy local model downloads. - π Dual LLM Integrations β Hot-swap between Groq (
llama-3.3-70b-versatilefor high-speed generation) and OpenAI (gpt-3.5-turboor newer) via environment variables.
graph TB
subgraph Client["Client Layer (Frontend)"]
UI[Web Interface/API Client]
end
subgraph API["FastAPI Application (Backend)"]
Upload[Upload Endpoint]
Ask[Ask Endpoint]
Docs[Documents Endpoint]
History[History Endpoint]
end
subgraph Processing["Document Pipeline"]
Ingest[Document Ingest & Parsing]
Vision[PyMuPDF Image Extractor]
LlamaVision[Groq Llama 4 Scout Vision]
Chunk[Recursive Text Splitter]
Embed[HF Inference Embeddings]
end
subgraph Storage["Storage Layer"]
Files[(Local File Uploads)]
Vector[(FAISS Vector Database)]
Memory[(In-Memory History Store)]
end
subgraph AI["Generative AI Layer"]
Retriever[Semantic Context Retriever]
LLM[Groq Llama 3.3 70B Engine]
end
UI --> |Upload PDFs| Upload
UI --> |Ask Questions| Ask
UI --> |Manage Docs| Docs
UI --> |View History| History
Upload --> Ingest
Ingest --> Chunk
Ingest --> Vision
Vision --> |Extract Raw Images| LlamaVision
LlamaVision --> |Visual Context| Chunk
Chunk --> Embed
Embed --> Vector
Upload --> Files
Ask --> Retriever
Retriever --> Vector
Retriever --> LLM
LLM --> |Response + Page Citations| Ask
Ask --> Memory
Docs --> Files
History --> Memory
style UI fill:#e1f5ff,stroke:#005571,stroke-width:2px
style LLM fill:#fff4e1,stroke:#ffa500,stroke-width:2px
style Vector fill:#f0e1ff,stroke:#8a2be2,stroke-width:2px
RAG_Search_Engine/
βββ backend/ # Backend application directory
β βββ main.py # FastAPI application & REST routing
β βββ rag.py # LangChain & RAG chain implementation
β βββ vision.py # Image extraction & Llama Scout processing
β βββ ingest.py # Document ingest & vector-store compilation
β βββ loaders.py # Custom document loaders
β βββ requirements.txt # Python backend packages
β βββ .env # Environment secrets (GROQ, HuggingFace keys)
β βββ uploads/ # Raw document storage directory
β βββ data/
β βββ faiss_index/ # Saved FAISS index binaries
β βββ images/ # Extracted image assets
β
βββ frontend/ # Frontend interface files
β βββ index.html # Unified glassmorphic client application
β
βββ Dockerfile # Multi-stage production container build
βββ render.yaml # Deployment blueprint configuration
βββ README.md # Project documentation
- Python 3.11+
- Groq API Key (Sign up for a free tier key at Groq Console)
- HuggingFace API Key (Create an API Token at HuggingFace Settings)
Clone the repository and set up a virtual environment:
git clone <your-repo-url>
cd RAG_Search_Engine
# Create virtual environment
python -m venv .venv
# Activate environment
# Windows:
.venv\Scripts\activate
# Linux/macOS:
source .venv/bin/activateInstall the backend dependencies:
pip install -r backend/requirements.txtCreate a .env file inside the backend/ directory:
# backend/.env
# LLM Configuration (groq / openai)
LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_groq_api_key_here
# Embeddings API Key
HF_API_KEY=hf_your_huggingface_api_key_here
# Optional: OpenAI Settings
# LLM_PROVIDER=openai
# OPENAI_API_KEY=sk_your_openai_api_key_hereLaunch the FastAPI backend from the root directory:
uvicorn backend.main:app --host 127.0.0.1 --port 8000 --reloadOnce running, access the web client or interactive docs:
- Web Interface UI: http://127.0.0.1:8000/
- Swagger API Docs: http://127.0.0.1:8000/docs
Upload single or multiple files (PDF, TXT, MD, HTML) to the engine:
curl -X POST "http://127.0.0.1:8000/upload" \
-F "files=@invoice.pdf" \
-F "files=@notes.txt"Response:
{
"message": "Successfully uploaded 2 file(s)",
"uploaded": [
{"id": "6f4e6f73-5eb8-4cfd-978b-1795efa39967", "filename": "invoice.pdf"},
{"id": "5ac9f8ba-bf48-410e-b0c3-08b648f88672", "filename": "notes.txt"}
],
"total_documents": 2,
"new_chunks_added": 42
}Query the knowledge base using semantic search:
curl -X POST "http://127.0.0.1:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "How much was charged on the invoice?"}'Response:
{
"answer": "The invoice charges total $450.00 for consultancy services.",
"sources": [
{
"content": "Invoice Summary:\nConsultancy: $450.00...",
"metadata": {
"source_file": "invoice.pdf",
"page": 1
}
}
],
"source_count": 1
}List indexed files:
curl http://127.0.0.1:8000/documentsDelete a specific file (deletes chunks and rebuilds vector store):
curl -X DELETE "http://127.0.0.1:8000/documents/6f4e6f73-5eb8-4cfd-978b-1795efa39967"| Method | Endpoint | Payload | Description |
|---|---|---|---|
POST |
/upload |
Multipart files | Upload and index documents |
POST |
/ask |
{"question": "..."} |
Submit query and retrieve answer with sources |
GET |
/documents |
None | Fetch all active documents |
DELETE |
/documents/{doc_id} |
Path parameter | Remove a document & rebuild the index |
GET |
/history |
None | Fetch recent chat conversation list |
DELETE |
/clear-history |
None | Clear the current user conversation history |
GET |
/ |
None | Serves the web interface dashboard |
To deploy the system inside a Docker container:
# Build the container image
docker build -t rag-search-engine .
# Run the container
docker run -p 8000:8000 -e GROQ_API_KEY="your_key" -e HF_API_KEY="your_key" rag-search-engineThe Dockerfile is optimized to execute in multi-stage environments like Render, Fly.io, or AWS ECS.
- FastAPI β High-performance python web framework.
- LangChain β System orchestration and LLM prompt layout.
- FAISS (Facebook AI Similarity Search) β Blazing-fast similarity lookup for vector indices.
- HuggingFace Inference API β Converts raw text into vectors using
all-MiniLM-L6-v2. - Groq Cloud β Hyper-fast execution of Llama 3.3 (Text) and Llama 4 Scout (Vision) models.