Sri Lankan Legal LLM (Team DEEPCROP)

A university project by Team DEEPCROP that builds a Legal LLM assistant for Sri Lanka. It answers questions related to Companies Act, Inland Revenue Act, and Labor Laws with accurate citations and context.

🚀 Project Overview

We designed a Retrieval-Augmented Generation (RAG) system that extracts, indexes, and retrieves Sri Lankan legal knowledge from official documents. Our system goes beyond a basic RAG pipeline by adopting foundation model best practices for reliability, accuracy, and user experience.

Key Features

📑 State-of-the-art extraction with Docling Extracts structured knowledge (titles, sections, content) from large PDFs.
🧠 VectorDB with FAISS Stores embeddings of Sri Lankan legal documents for fast, semantic retrieval.
🎯 Query Optimization Before retrieval, user queries are rewritten into context-rich, expressive forms, improving accuracy.
💬 Chat Memory (LangChain) Keeps conversation history for natural, context-aware dialogues.
🤖 Well-designed prompts Role assignment, system prompts, and few-shot examples ensure consistent legal answers.
🔒 Domain-bound RAG Answers strictly within legal context. If out of scope, the assistant explains why.

📂 Project Structure

.
├── extractor/        # PDF → structured knowledge (Docling)
│   ├── companies_act.pdf
│   ├── inland_rev.pdf
│   ├── labor_laws.pdf
│   ├── extract_from_docs.ipynb / .py
│   ├── requirements.txt
│   └── scratch/
│
├── server/           # FastAPI backend with LangChain + FAISS
│   ├── app/
│   │   ├── api.py           # API endpoints (chat, ingest)
│   │   ├── pipeline.py      # Query rewrite, RAG pipeline
│   │   ├── prompts.py       # System & query prompts
│   │   ├── vectorstore.py   # FAISS index builder/loader
│   │   └── schemas.py
│   ├── docs/                # Extracted legal documents (Markdown)
│   ├── data/faiss_index/    # Vector DB index files
│   ├── main.py              # FastAPI entrypoint
│   └── requirements.txt
│
├── frontend/        # React-based chat interface
│   ├── src/
│   │   ├── App.jsx          # Main frontend logic
│   │   ├── components/      # Chat UI (ChatInput, ChatMessage, etc.)
│   │   └── index.css
│   └── vite.config.js
│
└── README.md

⚙️ Tech Stack

Backend: FastAPI, LangChain, FAISS, Google Generative AI
Frontend: React + Vite
Extraction: Docling (PDF → structured text)
Vector DB: FAISS (semantic search)
LLM: Google Gemini (via LangChain integration)

🔑 How It Works

Document Ingestion
- Docling extracts structured knowledge (Acts, Sections, Subsections) from legal PDFs.
- Extracted text is chunked and stored in FAISS Vector DB with embeddings.
User Query → Optimized Query
- User inputs a question (e.g., “What are the penalties for late tax filing?”).
- A query rewriting chain expands and optimizes it into more expressive legal queries.
Retrieval + Generation
- Optimized query retrieves relevant chunks from FAISS.
- LLM generates an answer strictly from context, with inline citations.
Chat Memory
- Session memory allows follow-up questions without losing context.

🖥️ Running the Project

Backend (FastAPI)

cd server
python -m venv venv
source venv/bin/activate   # or venv\Scripts\activate on Windows
pip install -r requirements.txt
uvicorn main:app --reload

Server runs on: http://127.0.0.1:8000

Frontend (React + Vite)

cd frontend
npm install
npm run dev

Frontend runs on: http://127.0.0.1:5173

👨‍👩‍👧‍👦 Team Contributions

Task	Members (Index Numbers)
Document Pipeline	21ug1040, 21ug1287, 21ug1021, 21ug1036, 21ug1066, 21ug1135
Vector Store & Retrieval	21ug1073, 21ug1287, 21ug1313
LLM Orchestration	21ug1073, 21ug1287, 21ug0926, 21ug1135
Backend API	21ug1073, 21ug1287, 21ug0956, 21ug1066
Frontend UX	21ug1021, 21ug1036, 21ug1073, 21ug1287

📌 Example Queries

“What are the duties of company directors under the Companies Act?”
“Explain penalties for late filing under Inland Revenue Act.”
“What are the minimum wage rules in Sri Lanka?”

📖 Notes

Strictly focused on Sri Lankan Business & Corporate Law.
Out-of-domain queries are handled gracefully (assistant explains and suggests legal ones).
This is an academic project; not a substitute for professional legal advice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sri Lankan Legal LLM (Team DEEPCROP)

🚀 Project Overview

Key Features

📂 Project Structure

⚙️ Tech Stack

🔑 How It Works

🖥️ Running the Project

Backend (FastAPI)

Frontend (React + Vite)

👨‍👩‍👧‍👦 Team Contributions

📌 Example Queries

📖 Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
extractor		extractor
frontend		frontend
server		server
README.md		README.md

Legal-LLM/deep_crop

Folders and files

Latest commit

History

Repository files navigation

Sri Lankan Legal LLM (Team DEEPCROP)

🚀 Project Overview

Key Features

📂 Project Structure

⚙️ Tech Stack

🔑 How It Works

🖥️ Running the Project

Backend (FastAPI)

Frontend (React + Vite)

👨‍👩‍👧‍👦 Team Contributions

📌 Example Queries

📖 Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages