Resilient multi-provider AI orchestration service with automatic fallback, circuit breaker, retry logic, and Prometheus observability. Built with Flask, PostgreSQL, and Docker.
Live API: https://ai-gateway-api-9sm2.onrender.com
- Architecture Overview
- Provider Routing Logic
- Challenge Coverage
- Reliability Features
- Observability
- API Documentation
- How to Run Locally ← Start here if you're new
- How to Run Tests
- Deployment Process
- Environment Variables
┌──────────────────────────────────────────────────────────┐
│ Client / User │
└────────────────────────┬─────────────────────────────────┘
│ POST /ai/task
▼
┌──────────────────────────────────────────────────────────┐
│ Flask API (routes.py) │
│ /ai/task /health /metrics /history /provider/status │
└────────┬──────────────┬──────────────┬───────────────────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌────────────────┐
│ Orchestrator│ │ Prometheus │ │ PostgreSQL │
│ │ │ Metrics │ │ (ai_requests │
│ ┌─────────┐│ │ │ │ table) │
│ │Circuit ││ └──────────────┘ └────────────────┘
│ │Breakers ││
│ └─────────┘│
└──┬────┬────┬────┬────┬───────────────────────────────────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌───────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌─────────────┐
│Mistral│ │Gemini│ │OpenAI│ │Claude│ │ HuggingFace │
│(pri 1)│ │(pri 2)│ │(pri 3)│ │(pri 4)│ │ (pri 5) │
└───────┘ └──────┘ └──────┘ └──────┘ └─────────────┘
Components:
| Component | File | Purpose |
|---|---|---|
| App Factory | app/__init__.py |
Flask app creation, DB init, blueprint registration |
| Config | app/config.py |
Environment-based configuration |
| Routes | app/routes.py |
REST API endpoints |
| Orchestrator | app/orchestrator.py |
Multi-provider routing with retry + fallback |
| Circuit Breaker | app/circuit_breaker.py |
Per-provider failure tracking and blocking |
| Decision Engine | app/decision.py |
Rule-based structured output for invoice/document tasks |
| Metrics | app/metrics.py |
Prometheus counters and histograms |
| Logging | app/logging_config.py |
Structured JSON logging to console + file |
| Models | app/models.py |
SQLAlchemy model for request history |
| Providers | app/providers/ |
Mistral, Gemini, OpenAI, Claude, HuggingFace integrations with NVIDIA fallback |
When a request is received with "provider": "auto" (default), the orchestrator tries providers in priority order:
1. Mistral (confidence: 0.88) ──fail──▶ retry once ──fail──▶
2. Gemini (confidence: 0.85) ──fail──▶ retry once ──fail──▶
3. OpenAI (confidence: 0.89) ──fail──▶ retry once ──fail──▶
4. Claude (confidence: 0.90) ──fail──▶ retry once ──fail──▶
5. HuggingFace (confidence: 0.75) ──fail──▶ retry once ──fail──▶ Error 503
- If a specific provider is requested (e.g.
"provider": "claude"), only that provider is attempted (no silent fallback). - Each provider's circuit breaker is checked before attempting a call — if a provider has failed 3+ times recently, it is skipped entirely.
- The
provider_usedfield in the response tells you which provider actually handled the request.
Provider key behavior:
mistralprefersNVIDIA_API_KEYand falls back toMISTRAL_API_KEY(NVIDIA endpoint)geminiprefersNVIDIA_API_KEYand falls back to nativeGEMINI_API_KEYopenaiuses nativeOPENAI_API_KEYonly when it starts withsk-; otherwise it usesNVIDIA_API_KEYclaudeprefersCLAUDE_API_KEY; if absent, it usesNVIDIA_API_KEYhuggingfaceusesNVIDIA_API_KEY(HF key kept for compatibility/status reporting)
Checklist against the 24-hour challenge:
| Requirement | Status | Notes |
|---|---|---|
| Backend service (Flask + REST APIs) | ✅ Done | /ai/task, /health, /metrics, /history |
| PostgreSQL persistence | ✅ Done | ai_requests model/table with request metadata |
| Multi-provider orchestration + fallback | ✅ Done | Auto mode: mistral -> gemini -> openai -> claude -> huggingface |
| Retry + timeout + circuit breaker | ✅ Done | Configurable retries/timeouts and per-provider breakers |
| Structured decision output | ✅ Done | invoice_check and document_review return PASS/FAIL/NEEDS_INFO |
| Request logging | ✅ Done | JSON logging with timestamp/provider/latency/status |
| Prometheus observability endpoint | ✅ Done | /metrics with request/error/latency/failover metrics |
| Dockerfile + Docker Compose | ✅ Done | App + Postgres + Prometheus + Grafana stack |
| CI pipeline | ✅ Done | Lint + test + Docker build in GitHub Actions |
| Live deployment | ✅ Done | Render URL is active |
| Bonus: rate limiting | ✅ Done | /ai/task 30/min, /history/cleanup 5/min |
| Bonus: request authentication | ✅ Done | X-API-Key via API_KEY env var |
| Bonus: OpenAPI docs | README API docs exist; no generated OpenAPI spec endpoint | |
| Bonus: Grafana dashboard | ✅ Done | Provisioned dashboard in monitoring/grafana/dashboards/ |
| Bonus: horizontal scaling note | ✅ Done | See Horizontal Scaling section |
- On failure, each provider is retried once before moving to the next (configurable via
MAX_RETRIES).
- Every API call has a 10-second timeout (configurable via
TIMEOUT_SECONDS). - If a provider doesn't respond in time, it's treated as a failure.
Each provider has its own circuit breaker with three states:
CLOSED ──(3 failures)──▶ OPEN ──(60s timeout)──▶ HALF_OPEN ──(success)──▶ CLOSED
│
(failure)
│
▼
OPEN
| Parameter | Default | Env Var |
|---|---|---|
| Failure threshold | 3 | CIRCUIT_BREAKER_THRESHOLD |
| Reset timeout | 60s | CIRCUIT_BREAKER_RESET_TIMEOUT |
Behavior:
- CLOSED — normal operation, requests go through
- OPEN — provider is blocked, requests skip to next provider
- HALF_OPEN — after timeout expires, one test request is allowed; success → CLOSED, failure → OPEN
Available at GET /metrics in Prometheus exposition format.
| Metric | Type | Labels | Description |
|---|---|---|---|
ai_request_count |
Counter | task, provider, status | Total requests processed |
ai_error_count |
Counter | provider | Total provider errors |
ai_provider_latency_ms |
Histogram | provider | Provider response latency (ms) |
ai_failover_count |
Counter | from_provider, to_provider | How often fallbacks occur |
Every log entry is a JSON object with:
{
"timestamp": "2026-03-11T12:00:00+00:00",
"level": "INFO",
"logger": "app.orchestrator",
"message": "Provider gemini succeeded in 230ms",
"provider": "gemini",
"latency_ms": 230,
"status": "success"
}Logs are written to both console and logs/app.log.
Submit a task to the AI orchestration service.
Request:
{
"task": "summarize",
"text": "Invoice for services rendered. Total amount: $5,000. Payment due: March 30, 2026.",
"provider": "auto"
}| Field | Type | Required | Description |
|---|---|---|---|
| task | string | Yes | Task type (e.g. summarize, invoice_check, document_review) |
| text | string | Yes | Input text to process |
| provider | string | No | auto (default), mistral, gemini, openai, claude, huggingface |
Response (200):
{
"provider_used": "gemini",
"result": "This invoice is for $5,000 in services, with payment due March 30, 2026.",
"confidence": 0.85,
"latency_ms": 450
}Response with Decision (for invoice_check or document_review tasks):
{
"provider_used": "claude",
"result": "...",
"confidence": 0.90,
"latency_ms": 300,
"decision": {
"decision": "PASS",
"reasons": ["Key financial fields present in document"],
"evidence": ["amount", "total", "due"]
}
}Decision values: PASS | FAIL | NEEDS_INFO
Error Responses:
| Code | Condition |
|---|---|
| 400 | Missing task or text, or body is not JSON |
| 503 | All providers failed |
{
"status": "healthy",
"timestamp": "2026-03-11T12:00:00+00:00"
}Returns Prometheus-format metrics (text/plain).
Returns the last 50 requests from the database:
[
{
"id": "uuid",
"timestamp": "2026-03-11T12:00:00",
"task": "summarize",
"provider": "gemini",
"latency_ms": 450,
"status": "success",
"result_summary": "...",
"user_id": "anonymous",
"error_message": null
}
]Returns key-based provider/fallback status flags for the dashboard.
{
"nvidia_fallback_enabled": true,
"openai_native_enabled": false,
"claude_native_enabled": false,
"gemini_native_enabled": false,
"huggingface_native_enabled": false
}Deletes rows with status=error from request history.
Rate limit: 5 requests/minute.
Request body (optional):
{
"older_than_minutes": 60
}If older_than_minutes is omitted, all error rows are removed.
The frontend includes an API-key helper near the input field:
- A demo hint is displayed for first-time users
- A
Use Demo Keybutton can auto-fill the key - The key is persisted in browser localStorage as
aigw_api_key
There are two ways to run the project: Option A (plain Python, quickest — no Docker needed) and Option B (Docker Compose, mirrors production exactly).
| Requirement | Version | Check |
|---|---|---|
| Python | 3.10 or newer | python --version |
| pip | any recent | pip --version |
| Git | any | git --version |
No database server required — the app will use a local SQLite file automatically.
git clone https://github.com/hashimminhas/ai-gateway.git
cd ai-gateway# macOS / Linux
python -m venv .venv
source .venv/bin/activate
# Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1
# Windows (Command Prompt)
python -m venv .venv
.venv\Scripts\activate.batYou should see
(.venv)at the start of your terminal prompt.
pip install -r requirements.txt
psycopg2-binarymay show a warning if PostgreSQL isn't installed — that is fine for SQLite mode.
Copy the example below and save it as .env in the project root:
# Minimum config — uses SQLite, no API keys needed to boot
DATABASE_URL=sqlite:///aigateway.db
# Optional — add real keys to actually call AI providers
# NVIDIA_API_KEY=nvapi-... # preferred unified fallback key
# MISTRAL_API_KEY=nvapi-... # backward-compatible fallback source
# OPENAI_API_KEY=sk-... # native OpenAI (optional)
# CLAUDE_API_KEY=sk-ant-... # native Claude (optional)
# GEMINI_API_KEY=AIza... # native Gemini (optional)
# HF_API_KEY=hf_... # currently not required by provider implementation
# Optional — protect the API with a key
# API_KEY=my-secret-keyWithout real API keys the app will start fine and the UI will load, but
/ai/taskcalls will return a 503 because no provider is reachable. Add at least one key to see real results.
# macOS / Linux
DATABASE_URL=sqlite:///aigateway.db python run.py
# Windows PowerShell
$env:DATABASE_URL="sqlite:///aigateway.db"; python run.py
# Or, if you added DATABASE_URL to your .env file (requires python-dotenv loaded):
python run.pyYou should see:
* Running on http://0.0.0.0:5000
* Debug mode: on
Navigate to http://localhost:5000 in your browser — the AI Gateway dashboard will load.
# Health check
curl http://localhost:5000/health
# Submit a task (requires at least one API key configured)
curl -X POST http://localhost:5000/ai/task \
-H "Content-Type: application/json" \
-d '{"task": "summarize", "text": "The quick brown fox jumps over the lazy dog."}'- Docker Desktop installed and running
git clone https://github.com/hashimminhas/ai-gateway.git
cd ai-gateway
# Create .env with your keys (PostgreSQL is started by Docker automatically)
cp .env.example .env # or create .env manually — see table belowdocker compose up --buildThis starts four containers:
- app (Flask on port 5000)
- db (PostgreSQL on port 5432)
- prometheus (metrics backend on port 9090)
- grafana (dashboard UI on port 3000)
Wait for the line Running on http://0.0.0.0:5000 before testing.
Navigate to http://localhost:5000.
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (default login:
admin/admin)
Grafana is pre-provisioned with:
- Data source: Prometheus (
http://prometheus:9090) - Dashboard: AI Gateway Overview
- File:
monitoring/grafana/dashboards/ai-gateway-overview.json
- File:
docker compose down # stop containers
docker compose down -v # also delete the database volume| Symptom | Likely cause | Fix |
|---|---|---|
ModuleNotFoundError: No module named 'flask' |
venv not activated or deps not installed | Run pip install -r requirements.txt inside the activated venv |
ModuleNotFoundError: No module named 'psycopg2' |
Using PostgreSQL URL without the driver | Switch to DATABASE_URL=sqlite:///aigateway.db for local dev |
OperationalError: could not connect to server |
PostgreSQL isn't running | Use SQLite (Option A) or start Docker (Option B) |
| Port 5000 already in use | Another process on port 5000 | Change the port: python run.py → edit run.py and set port=5001 |
| Grafana shows no data | App not scraped by Prometheus yet | Check http://localhost:9090/targets and verify ai-gateway target is UP |
| Dashboard missing in Grafana | Provisioning files not mounted | Verify monitoring/grafana/provisioning/ and restart compose |
/ai/task returns 503 |
No usable API keys configured | Set NVIDIA_API_KEY (recommended) or set native keys (OPENAI_API_KEY, CLAUDE_API_KEY, GEMINI_API_KEY) |
Windows — Activate.ps1 cannot be loaded |
PowerShell execution policy | Run Set-ExecutionPolicy -Scope CurrentUser RemoteSigned first |
# Install dependencies
pip install -r requirements.txt
# Run all tests
pytest tests/ -vTests use SQLite in-memory and mocked provider calls — no real API keys or PostgreSQL needed.
Test coverage:
test_routes.py— API endpoint validation (8 tests)test_orchestrator.py— Fallback, circuit breaker, error handling (4 tests)test_circuit_breaker.py— State transitions CLOSED → OPEN → HALF_OPEN → CLOSED (6 tests)
- Push code to GitHub
- Go to render.com → New → Blueprint
- Connect your GitHub repo (
hashimminhas/ai-gateway) render.yamlis auto-detected — click Deploy- Set API keys in the environment variables form
- Wait for build to complete (~2 minutes)
- Your live URL:
https://ai-gateway-api-9sm2.onrender.com
Every push to main triggers:
Lint (flake8) → Test (pytest + PostgreSQL) → Build (Docker image)
If Docker Hub secrets are configured (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN), the pipeline also tags and pushes ai-gateway:latest.
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
Yes | postgresql://localhost/aigateway |
PostgreSQL connection string |
NVIDIA_API_KEY |
No | "" |
Preferred NVIDIA key for unified fallback endpoint |
MISTRAL_API_KEY |
No | "" |
Backward-compatible fallback source for NVIDIA_API_KEY |
OPENAI_API_KEY |
No | "" |
Native OpenAI key (sk-...) when available |
CLAUDE_API_KEY |
No | "" |
Anthropic Claude API key |
GEMINI_API_KEY |
No | "" |
Google Gemini API key |
HF_API_KEY |
No | "" |
HuggingFace Inference API token |
API_KEY |
No | "" |
Optional auth key for API protection |
TIMEOUT_SECONDS |
No | 10 |
Provider call timeout |
MAX_RETRIES |
No | 1 |
Retries per provider before fallback |
CIRCUIT_BREAKER_THRESHOLD |
No | 3 |
Failures before circuit opens |
CIRCUIT_BREAKER_RESET_TIMEOUT |
No | 60 |
Seconds before circuit tries again |
Protected endpoints (POST /ai/task, GET /history, GET /provider/status, POST /history/cleanup) require an X-API-Key header when the API_KEY environment variable is set.
curl -X POST https://ai-gateway-api-9sm2.onrender.com/ai/task \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret-key" \
-d '{"task": "summarize", "text": "Your text here"}'If API_KEY is not set, authentication is disabled (open access).
The /ai/task endpoint is rate-limited to 30 requests per minute per IP address using flask-limiter. Exceeding the limit returns 429 Too Many Requests.
The service is designed for horizontal scaling:
- Stateless application tier — no in-memory session state; all request data is stored in PostgreSQL
- Gunicorn workers — the Docker image runs gunicorn with multiple workers (
-w 2) for concurrent request handling; increase workers based on available CPU cores - Database-backed persistence — circuit breaker state resets on restart, but request history is durable in PostgreSQL
- Container-ready — deploy multiple replicas behind a load balancer (e.g. Render, Kubernetes, ECS) with the same
DATABASE_URL
| Layer | Technology |
|---|---|
| Backend | Flask 3.1 (Python 3.11) |
| Database | PostgreSQL 15 |
| ORM | Flask-SQLAlchemy |
| Monitoring | prometheus-client |
| Containerization | Docker + Docker Compose |
| CI/CD | GitHub Actions |
| Hosting | Render (free tier) |
| HTTP Server | Gunicorn |
MIT — see LICENSE for details.