This project is an end-to-end system for querying PDF documents using a chatbot interface. It combines a modern Next.js frontend with Python microservices for document processing, retrieval, and LLM-based question answering. The system is designed for extensibility and performance, supporting scalable document ingestion and semantic search.
- Server Components in Next.js: Efficient rendering and data fetching (MDN: Server Components).
- TypeScript for Type Safety: Ensures robust code and easier refactoring (MDN: TypeScript).
- Custom Hooks and Utility Functions: Modularizes logic for reusability.
- Python Microservices: Decouples document processing and QA logic for scalability.
- Chroma Vector Database: Enables fast semantic search over document embeddings (ChromaDB).
- PDF Parsing and Embedding: Processes and indexes PDF content for retrieval.
- LLM Integration: Uses language models for natural language question answering.
- Next.js (React framework)
- ChromaDB (vector database)
- LangChain (LLM orchestration)
- FastAPI (Python web framework)
- PyPDF2 (PDF parsing)
- Tailwind CSS (utility-first CSS)
- Vercel (deployment platform)
- PostCSS (CSS processing)
- ESLint (code linting)
- TypeScript (typed JavaScript)
- React (UI library)
.
├── pdf-qa-chatbot/
│ ├── public/
│ ├── src/
│ │ ├── app/
│ │ ├── lib/
├── Python_Microservices_Be/
│ ├── chroma_db_legal/
│ ├── uploads/
- pdf-qa-chatbot/public/: Contains SVG assets for UI.
- pdf-qa-chatbot/src/app/: Next.js app directory, includes global styles and layout.
- pdf-qa-chatbot/src/lib/: Utility functions for frontend logic.
- Python_Microservices_Be/chroma_db_legal/: ChromaDB vector store and metadata.
- Python_Microservices_Be/uploads/: Uploaded PDF documents for processing.
No custom fonts detected; uses system or default web fonts.
- Install dependencies:
cd pdf-qa-chatbot npm install - Run the development server:
The app will be available at http://localhost:3000.
npm run dev
- Create a virtual environment:
python -m venv venv
- Activate the environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
- Install dependencies:
pip install -r Python_Microservices_Be/requirements.txt
-
Install Ollama Desktop:
- Download and install from Ollama Desktop.
-
Download GGUF Model from Hugging Face:
- Visit Indian-LegalBot-Llama-3.1-8B-GGUF.
- Download the recommended version:
Q4_K_SorQ4_K_M.
-
Convert GGUF Model for Ollama Compatibility:
- Create a
Modfilein your model directory with the following content:FROM ./Indian-LegalBot-Llama-3.1-8B-Q4_K_S.gguf PARAMETER stop "<|eot_id|>" - Replace the filename with your downloaded GGUF file.
- Build the model for Ollama:
ollama create indian-legalbot -f Modfile
- The model is now available for use with Ollama.
- Create a
-
Update Python Microservices Configuration:
- Open
Python_Microservices_Be/config.py. - Set the model name:
LLM_MODEL_NAME = "indian-legalbot"
- This ensures your microservices use the correct Ollama model.
- Open
...
-
Navigate to the frontend directory:
cd pdf-qa-chatbot -
Install dependencies:
npm install
-
Start the development server:
npm run dev
The application will be available at http://localhost:3000 ...
...
-
Activate your virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
-
Start the microservices:
flask run --port=5001
BODY Example:
{
"collection_name": "legal_case_a1b2c3d4e5",
"message": "File 'sample_case.pdf' processed successfully."
}BODY Example:
{
"question": "What is the main subject of this document?",
"collection_name": "legal_case_a1b2c3d4e5"
}
BODY Example:
{
"question": "Explain the concept of 'audi alteram partem' in Indian law."
}...