AI-powered math tutoring for Lebanese high school students, built with a single FastAPI application.
src/ # Single consolidated FastAPI application (Port 8000)
├── main.py # Unified app with lifespan, middleware, routes
├── config.py # Single merged Config class
├── clients/ # OpenAI client singletons (LLM + Embedding)
├── services/ # Business logic modules
│ ├── input_processor/ # Text/image processing
│ ├── reformulator/ # Query improvement via vLLM
│ ├── intent_classifier/ # Hybrid rule-based + LLM classification
│ ├── session/ # In-memory session state with TTL cleanup
│ └── vector_cache/ # Qdrant-backed vector storage
├── orchestrators/ # Pipeline orchestration
│ ├── answer_retrieval/ # 4-tier confidence routing
│ ├── data_processing/ # Input processing + reformulation
│ └── tutoring/ # Interactive tutoring flow
└── routes/admin.py # Health, metrics, logs, tracking
External:
├── qdrant/ # Vector database (Port 6333)
├── open-webui/ # Chat UI (Port 3000)
├── prometheus/ # Metrics collection (Port 9090)
└── grafana/ # Dashboard (Port 3001)
Pipeline: The app orchestrates two phases:
- Data Processing: Input Processor → Reformulator
- Answer Retrieval: 4-Tier Confidence Routing
- Tier 1 (>=0.85): Small LLM validates cached answer or generates new one
- Tier 2 (0.70-0.85): Small LLM generates with cache context
- Tier 3 (0.50-0.70): Fine-tuned model
- Tier 4 (<0.50): Large LLM for novel questions
Tutoring Mode: Interactive step-by-step problem solving using Session, Intent Classifier, and Fine-Tuned Model. The is_new_branch flag skips cache on new conversation nodes.
- Python 3.14+, Docker and Docker Compose
- OpenAI API key, RunPod API key
- Copy
.env.exampleto.envand fill in API keys and RunPod endpoint IDs docker compose up --build- App API:
http://localhost:8000, Open WebUI:http://localhost:3000
LLM Backend: Small LLM, Reformulator, and Fine-Tuned Model use RunPod Serverless vLLM (runpod/worker-v1-vllm:v2.11.3) with deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.
App (http://localhost:8000):
| Endpoint | Description |
|---|---|
GET /health |
Health check (Qdrant + session status) |
GET /v1/models |
List available models (OpenAI-compatible) |
POST /v1/chat/completions |
OpenAI-compatible chat endpoint |
POST /tutoring |
Tutoring interaction endpoint |
GET /metrics |
Prometheus metrics |
GET /track/{request_id} |
Trace request logs |
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"math-tutor","messages":[{"role":"user","content":"What is the derivative of x^2?"}]}'python3.14 cli.py clean # isort + black + mypy (run before committing)
pytest -n auto # Run all tests in parallelFor faster iteration without serverless cold starts, spin up a RunPod pod with 3 GPUs, each running its own vLLM instance:
python3.14 cli.py pod start # Create 3-GPU pod, wait for vLLM, generate .env.dev
docker compose up --build # Services use dev pod instead of serverless
python3.14 cli.py pod stop # Destroy pod, delete .env.dev (back to serverless)How it works: pod start creates a 3-GPU pod where each vLLM instance is pinned to its own GPU via CUDA_VISIBLE_DEVICES (ports 8000-8002). It writes .env.dev which overrides the serverless URLs in .env. Docker Compose loads .env.dev automatically when it exists. pod stop destroys the pod and deletes .env.dev so services fall back to serverless.
GPU fallback: Tries A40 48GB, RTX A6000 48GB in order until available.
python3.14 cli.py test # All tests
python3.14 cli.py test -- -m unit # Unit tests (no external deps)
python3.14 cli.py test -- -m integration # Integration (Docker + RunPod)
python3.14 cli.py test -- -m e2e # E2E (Docker + RunPod)Integration/E2E tests mock external APIs by default. Use --use-real-apis to test against real endpoints. CI runs the full suite against RunPod Serverless.
- Pre-merge (
.github/workflows/pre-merge-checks.yml): isort + black + mypy on every push/PR - Full tests (
.github/workflows/run-tests.yml): All tests against RunPod endpoints
- Prometheus (
http://localhost:9090) — metrics collection via/metricsendpoint - Grafana (
http://localhost:3001, admin/admin) — pre-configured dashboard with 13 panels across 5 rows (overview stats, pipeline performance, LLM & cache, sessions & tutoring, app health) - Structured logging — logs to stdout +
.logs/app/app-YYYY-MM-DD.log(date-based files, 7-day retention) - Request tracing — unique request IDs; trace via
GET /track/{request_id}
Tools in data_preprocessing/ for processing Lebanese math curriculum: PDF splitting, LaTeX conversion, exercise extraction, solution generation.
This project is for educational purposes.