Skip to content

A university project by Team DEEPCROP that builds a Legal LLM assistant for Sri Lanka. It answers questions related to Companies Act, Inland Revenue Act, and Labor Laws with accurate citations and context.

Notifications You must be signed in to change notification settings

Legal-LLM/deep_crop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Sri Lankan Legal LLM (Team DEEPCROP)

Python FastAPI LangChain FAISS Google Gemini React License

A university project by Team DEEPCROP that builds a Legal LLM assistant for Sri Lanka. It answers questions related to Companies Act, Inland Revenue Act, and Labor Laws with accurate citations and context.


🚀 Project Overview

We designed a Retrieval-Augmented Generation (RAG) system that extracts, indexes, and retrieves Sri Lankan legal knowledge from official documents. Our system goes beyond a basic RAG pipeline by adopting foundation model best practices for reliability, accuracy, and user experience.

Key Features

  • 📑 State-of-the-art extraction with Docling Extracts structured knowledge (titles, sections, content) from large PDFs.
  • 🧠 VectorDB with FAISS Stores embeddings of Sri Lankan legal documents for fast, semantic retrieval.
  • 🎯 Query Optimization Before retrieval, user queries are rewritten into context-rich, expressive forms, improving accuracy.
  • 💬 Chat Memory (LangChain) Keeps conversation history for natural, context-aware dialogues.
  • 🤖 Well-designed prompts Role assignment, system prompts, and few-shot examples ensure consistent legal answers.
  • 🔒 Domain-bound RAG Answers strictly within legal context. If out of scope, the assistant explains why.

📂 Project Structure

.
├── extractor/        # PDF → structured knowledge (Docling)
│   ├── companies_act.pdf
│   ├── inland_rev.pdf
│   ├── labor_laws.pdf
│   ├── extract_from_docs.ipynb / .py
│   ├── requirements.txt
│   └── scratch/
│
├── server/           # FastAPI backend with LangChain + FAISS
│   ├── app/
│   │   ├── api.py           # API endpoints (chat, ingest)
│   │   ├── pipeline.py      # Query rewrite, RAG pipeline
│   │   ├── prompts.py       # System & query prompts
│   │   ├── vectorstore.py   # FAISS index builder/loader
│   │   └── schemas.py
│   ├── docs/                # Extracted legal documents (Markdown)
│   ├── data/faiss_index/    # Vector DB index files
│   ├── main.py              # FastAPI entrypoint
│   └── requirements.txt
│
├── frontend/        # React-based chat interface
│   ├── src/
│   │   ├── App.jsx          # Main frontend logic
│   │   ├── components/      # Chat UI (ChatInput, ChatMessage, etc.)
│   │   └── index.css
│   └── vite.config.js
│
└── README.md

⚙️ Tech Stack

  • Backend: FastAPI, LangChain, FAISS, Google Generative AI
  • Frontend: React + Vite
  • Extraction: Docling (PDF → structured text)
  • Vector DB: FAISS (semantic search)
  • LLM: Google Gemini (via LangChain integration)

🔑 How It Works

  1. Document Ingestion

    • Docling extracts structured knowledge (Acts, Sections, Subsections) from legal PDFs.
    • Extracted text is chunked and stored in FAISS Vector DB with embeddings.
  2. User Query → Optimized Query

    • User inputs a question (e.g., “What are the penalties for late tax filing?”).
    • A query rewriting chain expands and optimizes it into more expressive legal queries.
  3. Retrieval + Generation

    • Optimized query retrieves relevant chunks from FAISS.
    • LLM generates an answer strictly from context, with inline citations.
  4. Chat Memory

    • Session memory allows follow-up questions without losing context.

🖥️ Running the Project

Backend (FastAPI)

cd server
python -m venv venv
source venv/bin/activate   # or venv\Scripts\activate on Windows
pip install -r requirements.txt
uvicorn main:app --reload

Server runs on: http://127.0.0.1:8000

Frontend (React + Vite)

cd frontend
npm install
npm run dev

Frontend runs on: http://127.0.0.1:5173


👨‍👩‍👧‍👦 Team Contributions

Task Members (Index Numbers)
Document Pipeline 21ug1040, 21ug1287, 21ug1021, 21ug1036, 21ug1066, 21ug1135
Vector Store & Retrieval 21ug1073, 21ug1287, 21ug1313
LLM Orchestration 21ug1073, 21ug1287, 21ug0926, 21ug1135
Backend API 21ug1073, 21ug1287, 21ug0956, 21ug1066
Frontend UX 21ug1021, 21ug1036, 21ug1073, 21ug1287

📌 Example Queries

  • “What are the duties of company directors under the Companies Act?”
  • “Explain penalties for late filing under Inland Revenue Act.”
  • “What are the minimum wage rules in Sri Lanka?”

📖 Notes

  • Strictly focused on Sri Lankan Business & Corporate Law.
  • Out-of-domain queries are handled gracefully (assistant explains and suggests legal ones).
  • This is an academic project; not a substitute for professional legal advice.

About

A university project by Team DEEPCROP that builds a Legal LLM assistant for Sri Lanka. It answers questions related to Companies Act, Inland Revenue Act, and Labor Laws with accurate citations and context.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published