AI-Gateway — Autonomous AI Orchestration Platform

Resilient multi-provider AI orchestration service with automatic fallback, circuit breaker, retry logic, and Prometheus observability. Built with Flask, PostgreSQL, and Docker.

Live API: https://ai-gateway-api-9sm2.onrender.com

Architecture Overview

┌──────────────────────────────────────────────────────────┐
│                      Client / User                       │
└────────────────────────┬─────────────────────────────────┘
                         │  POST /ai/task
                         ▼
┌──────────────────────────────────────────────────────────┐
│                    Flask API (routes.py)                  │
│ /ai/task /health /metrics /history /provider/status       │
└────────┬──────────────┬──────────────┬───────────────────┘
         │              │              │
         ▼              ▼              ▼
┌────────────┐  ┌──────────────┐  ┌────────────────┐
│ Orchestrator│  │  Prometheus  │  │   PostgreSQL   │
│             │  │  Metrics     │  │   (ai_requests │
│ ┌─────────┐│  │              │  │    table)      │
│ │Circuit  ││  └──────────────┘  └────────────────┘
│ │Breakers ││
│ └─────────┘│
└──┬────┬────┬────┬────┬───────────────────────────────────┘
  │    │    │    │    │
  ▼    ▼    ▼    ▼    ▼
┌───────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌─────────────┐
│Mistral│ │Gemini│ │OpenAI│ │Claude│ │ HuggingFace │
│(pri 1)│ │(pri 2)│ │(pri 3)│ │(pri 4)│ │  (pri 5)    │
└───────┘ └──────┘ └──────┘ └──────┘ └─────────────┘

Components:

Component	File	Purpose
App Factory	`app/__init__.py`	Flask app creation, DB init, blueprint registration
Config	`app/config.py`	Environment-based configuration
Routes	`app/routes.py`	REST API endpoints
Orchestrator	`app/orchestrator.py`	Multi-provider routing with retry + fallback
Circuit Breaker	`app/circuit_breaker.py`	Per-provider failure tracking and blocking
Decision Engine	`app/decision.py`	Rule-based structured output for invoice/document tasks
Metrics	`app/metrics.py`	Prometheus counters and histograms
Logging	`app/logging_config.py`	Structured JSON logging to console + file
Models	`app/models.py`	SQLAlchemy model for request history
Providers	`app/providers/`	Mistral, Gemini, OpenAI, Claude, HuggingFace integrations with NVIDIA fallback

Provider Routing Logic

When a request is received with "provider": "auto" (default), the orchestrator tries providers in priority order:

1. Mistral      (confidence: 0.88)  ──fail──▶ retry once ──fail──▶
2. Gemini       (confidence: 0.85)  ──fail──▶ retry once ──fail──▶
3. OpenAI       (confidence: 0.89)  ──fail──▶ retry once ──fail──▶
4. Claude       (confidence: 0.90)  ──fail──▶ retry once ──fail──▶
5. HuggingFace  (confidence: 0.75)  ──fail──▶ retry once ──fail──▶ Error 503

If a specific provider is requested (e.g. "provider": "claude"), only that provider is attempted (no silent fallback).
Each provider's circuit breaker is checked before attempting a call — if a provider has failed 3+ times recently, it is skipped entirely.
The provider_used field in the response tells you which provider actually handled the request.

Provider key behavior:

mistral prefers NVIDIA_API_KEY and falls back to MISTRAL_API_KEY (NVIDIA endpoint)
gemini prefers NVIDIA_API_KEY and falls back to native GEMINI_API_KEY
openai uses native OPENAI_API_KEY only when it starts with sk-; otherwise it uses NVIDIA_API_KEY
claude prefers CLAUDE_API_KEY; if absent, it uses NVIDIA_API_KEY
huggingface uses NVIDIA_API_KEY (HF key kept for compatibility/status reporting)

Challenge Coverage

Checklist against the 24-hour challenge:

Requirement	Status	Notes
Backend service (Flask + REST APIs)	✅ Done	`/ai/task`, `/health`, `/metrics`, `/history`
PostgreSQL persistence	✅ Done	`ai_requests` model/table with request metadata
Multi-provider orchestration + fallback	✅ Done	Auto mode: `mistral -> gemini -> openai -> claude -> huggingface`
Retry + timeout + circuit breaker	✅ Done	Configurable retries/timeouts and per-provider breakers
Structured decision output	✅ Done	`invoice_check` and `document_review` return PASS/FAIL/NEEDS_INFO
Request logging	✅ Done	JSON logging with timestamp/provider/latency/status
Prometheus observability endpoint	✅ Done	`/metrics` with request/error/latency/failover metrics
Dockerfile + Docker Compose	✅ Done	App + Postgres + Prometheus + Grafana stack
CI pipeline	✅ Done	Lint + test + Docker build in GitHub Actions
Live deployment	✅ Done	Render URL is active
Bonus: rate limiting	✅ Done	`/ai/task` 30/min, `/history/cleanup` 5/min
Bonus: request authentication	✅ Done	`X-API-Key` via `API_KEY` env var
Bonus: OpenAPI docs	⚠️ Not implemented	README API docs exist; no generated OpenAPI spec endpoint
Bonus: Grafana dashboard	✅ Done	Provisioned dashboard in `monitoring/grafana/dashboards/`
Bonus: horizontal scaling note	✅ Done	See Horizontal Scaling section

Reliability Features

Retry Mechanism

On failure, each provider is retried once before moving to the next (configurable via MAX_RETRIES).

Timeout Handling

Every API call has a 10-second timeout (configurable via TIMEOUT_SECONDS).
If a provider doesn't respond in time, it's treated as a failure.

Circuit Breaker

Each provider has its own circuit breaker with three states:

CLOSED ──(3 failures)──▶ OPEN ──(60s timeout)──▶ HALF_OPEN ──(success)──▶ CLOSED
                                                      │
                                                  (failure)
                                                      │
                                                      ▼
                                                    OPEN

Parameter	Default	Env Var
Failure threshold	3	`CIRCUIT_BREAKER_THRESHOLD`
Reset timeout	60s	`CIRCUIT_BREAKER_RESET_TIMEOUT`

Behavior:

CLOSED — normal operation, requests go through
OPEN — provider is blocked, requests skip to next provider
HALF_OPEN — after timeout expires, one test request is allowed; success → CLOSED, failure → OPEN

Observability

Prometheus Metrics

Available at GET /metrics in Prometheus exposition format.

Metric	Type	Labels	Description
`ai_request_count`	Counter	task, provider, status	Total requests processed
`ai_error_count`	Counter	provider	Total provider errors
`ai_provider_latency_ms`	Histogram	provider	Provider response latency (ms)
`ai_failover_count`	Counter	from_provider, to_provider	How often fallbacks occur

Structured Logging

Every log entry is a JSON object with:

{
  "timestamp": "2026-03-11T12:00:00+00:00",
  "level": "INFO",
  "logger": "app.orchestrator",
  "message": "Provider gemini succeeded in 230ms",
  "provider": "gemini",
  "latency_ms": 230,
  "status": "success"
}

Logs are written to both console and logs/app.log.

API Documentation

POST /ai/task

Submit a task to the AI orchestration service.

Request:

{
  "task": "summarize",
  "text": "Invoice for services rendered. Total amount: $5,000. Payment due: March 30, 2026.",
  "provider": "auto"
}

Field	Type	Required	Description
task	string	Yes	Task type (e.g. `summarize`, `invoice_check`, `document_review`)
text	string	Yes	Input text to process
provider	string	No	`auto` (default), `mistral`, `gemini`, `openai`, `claude`, `huggingface`

Response (200):

{
  "provider_used": "gemini",
  "result": "This invoice is for $5,000 in services, with payment due March 30, 2026.",
  "confidence": 0.85,
  "latency_ms": 450
}

Response with Decision (for invoice_check or document_review tasks):

{
  "provider_used": "claude",
  "result": "...",
  "confidence": 0.90,
  "latency_ms": 300,
  "decision": {
    "decision": "PASS",
    "reasons": ["Key financial fields present in document"],
    "evidence": ["amount", "total", "due"]
  }
}

Decision values: PASS | FAIL | NEEDS_INFO

Error Responses:

Code	Condition
400	Missing `task` or `text`, or body is not JSON
503	All providers failed

GET /health

{
  "status": "healthy",
  "timestamp": "2026-03-11T12:00:00+00:00"
}

GET /metrics

Returns Prometheus-format metrics (text/plain).

GET /history

Returns the last 50 requests from the database:

[
  {
    "id": "uuid",
    "timestamp": "2026-03-11T12:00:00",
    "task": "summarize",
    "provider": "gemini",
    "latency_ms": 450,
    "status": "success",
    "result_summary": "...",
    "user_id": "anonymous",
    "error_message": null
  }
]

GET /provider/status

Returns key-based provider/fallback status flags for the dashboard.

{
  "nvidia_fallback_enabled": true,
  "openai_native_enabled": false,
  "claude_native_enabled": false,
  "gemini_native_enabled": false,
  "huggingface_native_enabled": false
}

POST /history/cleanup

Deletes rows with status=error from request history.

Rate limit: 5 requests/minute.

Request body (optional):

{
  "older_than_minutes": 60
}

If older_than_minutes is omitted, all error rows are removed.

Dashboard API key helper

The frontend includes an API-key helper near the input field:

A demo hint is displayed for first-time users
A Use Demo Key button can auto-fill the key
The key is persisted in browser localStorage as aigw_api_key

How to Run Locally

There are two ways to run the project: Option A (plain Python, quickest — no Docker needed) and Option B (Docker Compose, mirrors production exactly).

Option A — Plain Python (Recommended for first-timers)

1. Prerequisites

Requirement	Version	Check
Python	3.10 or newer	`python --version`
pip	any recent	`pip --version`
Git	any	`git --version`

No database server required — the app will use a local SQLite file automatically.

2. Clone the repository

git clone https://github.com/hashimminhas/ai-gateway.git
cd ai-gateway

3. Create and activate a virtual environment

# macOS / Linux
python -m venv .venv
source .venv/bin/activate

# Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1

# Windows (Command Prompt)
python -m venv .venv
.venv\Scripts\activate.bat

You should see (.venv) at the start of your terminal prompt.

4. Install dependencies

pip install -r requirements.txt

psycopg2-binary may show a warning if PostgreSQL isn't installed — that is fine for SQLite mode.

5. Create your `.env` file

Copy the example below and save it as .env in the project root:

# Minimum config — uses SQLite, no API keys needed to boot
DATABASE_URL=sqlite:///aigateway.db

# Optional — add real keys to actually call AI providers
# NVIDIA_API_KEY=nvapi-...           # preferred unified fallback key
# MISTRAL_API_KEY=nvapi-...          # backward-compatible fallback source
# OPENAI_API_KEY=sk-...              # native OpenAI (optional)
# CLAUDE_API_KEY=sk-ant-...          # native Claude (optional)
# GEMINI_API_KEY=AIza...             # native Gemini (optional)
# HF_API_KEY=hf_...                  # currently not required by provider implementation

# Optional — protect the API with a key
# API_KEY=my-secret-key

Without real API keys the app will start fine and the UI will load, but /ai/task calls will return a 503 because no provider is reachable. Add at least one key to see real results.

6. Start the server

# macOS / Linux
DATABASE_URL=sqlite:///aigateway.db python run.py

# Windows PowerShell
$env:DATABASE_URL="sqlite:///aigateway.db"; python run.py

# Or, if you added DATABASE_URL to your .env file (requires python-dotenv loaded):
python run.py

You should see:

 * Running on http://0.0.0.0:5000
 * Debug mode: on

7. Open the frontend

Navigate to http://localhost:5000 in your browser — the AI Gateway dashboard will load.

8. Quick smoke-test

# Health check
curl http://localhost:5000/health

# Submit a task (requires at least one API key configured)
curl -X POST http://localhost:5000/ai/task \
  -H "Content-Type: application/json" \
  -d '{"task": "summarize", "text": "The quick brown fox jumps over the lazy dog."}'

Option B — Docker Compose (mirrors production)

1. Prerequisites

Docker Desktop installed and running

2. Clone and configure

git clone https://github.com/hashimminhas/ai-gateway.git
cd ai-gateway

# Create .env with your keys (PostgreSQL is started by Docker automatically)
cp .env.example .env   # or create .env manually — see table below

3. Start everything

docker compose up --build

This starts four containers:

app (Flask on port 5000)
db (PostgreSQL on port 5432)
prometheus (metrics backend on port 9090)
grafana (dashboard UI on port 3000)

Wait for the line Running on http://0.0.0.0:5000 before testing.

4. Open the frontend

Navigate to http://localhost:5000.

5. Open observability tools

Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (default login: admin / admin)

Grafana is pre-provisioned with:

Data source: Prometheus (http://prometheus:9090)
Dashboard: AI Gateway Overview
- File: monitoring/grafana/dashboards/ai-gateway-overview.json

6. Stop the service

docker compose down        # stop containers
docker compose down -v     # also delete the database volume

Troubleshooting

Symptom	Likely cause	Fix
`ModuleNotFoundError: No module named 'flask'`	venv not activated or deps not installed	Run `pip install -r requirements.txt` inside the activated venv
`ModuleNotFoundError: No module named 'psycopg2'`	Using PostgreSQL URL without the driver	Switch to `DATABASE_URL=sqlite:///aigateway.db` for local dev
`OperationalError: could not connect to server`	PostgreSQL isn't running	Use SQLite (Option A) or start Docker (Option B)
Port 5000 already in use	Another process on port 5000	Change the port: `python run.py` → edit `run.py` and set `port=5001`
Grafana shows no data	App not scraped by Prometheus yet	Check `http://localhost:9090/targets` and verify `ai-gateway` target is UP
Dashboard missing in Grafana	Provisioning files not mounted	Verify `monitoring/grafana/provisioning/` and restart compose
`/ai/task` returns 503	No usable API keys configured	Set `NVIDIA_API_KEY` (recommended) or set native keys (`OPENAI_API_KEY`, `CLAUDE_API_KEY`, `GEMINI_API_KEY`)
Windows — `Activate.ps1 cannot be loaded`	PowerShell execution policy	Run `Set-ExecutionPolicy -Scope CurrentUser RemoteSigned` first

How to Run Tests

# Install dependencies
pip install -r requirements.txt

# Run all tests
pytest tests/ -v

Tests use SQLite in-memory and mocked provider calls — no real API keys or PostgreSQL needed.

Test coverage:

test_routes.py — API endpoint validation (8 tests)
test_orchestrator.py — Fallback, circuit breaker, error handling (4 tests)
test_circuit_breaker.py — State transitions CLOSED → OPEN → HALF_OPEN → CLOSED (6 tests)

Deployment Process

One-Click Deploy to Render

Push code to GitHub
Go to render.com → New → Blueprint
Connect your GitHub repo (hashimminhas/ai-gateway)
render.yaml is auto-detected — click Deploy
Set API keys in the environment variables form
Wait for build to complete (~2 minutes)
Your live URL: https://ai-gateway-api-9sm2.onrender.com

CI/CD Pipeline (GitHub Actions)

Every push to main triggers:

Lint (flake8) → Test (pytest + PostgreSQL) → Build (Docker image)

If Docker Hub secrets are configured (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN), the pipeline also tags and pushes ai-gateway:latest.

Environment Variables

Variable	Required	Default	Description
`DATABASE_URL`	Yes	`postgresql://localhost/aigateway`	PostgreSQL connection string
`NVIDIA_API_KEY`	No	`""`	Preferred NVIDIA key for unified fallback endpoint
`MISTRAL_API_KEY`	No	`""`	Backward-compatible fallback source for `NVIDIA_API_KEY`
`OPENAI_API_KEY`	No	`""`	Native OpenAI key (`sk-...`) when available
`CLAUDE_API_KEY`	No	`""`	Anthropic Claude API key
`GEMINI_API_KEY`	No	`""`	Google Gemini API key
`HF_API_KEY`	No	`""`	HuggingFace Inference API token
`API_KEY`	No	`""`	Optional auth key for API protection
`TIMEOUT_SECONDS`	No	`10`	Provider call timeout
`MAX_RETRIES`	No	`1`	Retries per provider before fallback
`CIRCUIT_BREAKER_THRESHOLD`	No	`3`	Failures before circuit opens
`CIRCUIT_BREAKER_RESET_TIMEOUT`	No	`60`	Seconds before circuit tries again

Security & Rate Limiting

API Key Authentication

Protected endpoints (POST /ai/task, GET /history, GET /provider/status, POST /history/cleanup) require an X-API-Key header when the API_KEY environment variable is set.

curl -X POST https://ai-gateway-api-9sm2.onrender.com/ai/task \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret-key" \
  -d '{"task": "summarize", "text": "Your text here"}'

If API_KEY is not set, authentication is disabled (open access).

Rate Limiting

The /ai/task endpoint is rate-limited to 30 requests per minute per IP address using flask-limiter. Exceeding the limit returns 429 Too Many Requests.

Horizontal Scaling

The service is designed for horizontal scaling:

Stateless application tier — no in-memory session state; all request data is stored in PostgreSQL
Gunicorn workers — the Docker image runs gunicorn with multiple workers (-w 2) for concurrent request handling; increase workers based on available CPU cores
Database-backed persistence — circuit breaker state resets on restart, but request history is durable in PostgreSQL
Container-ready — deploy multiple replicas behind a load balancer (e.g. Render, Kubernetes, ECS) with the same DATABASE_URL

Tech Stack

Layer	Technology
Backend	Flask 3.1 (Python 3.11)
Database	PostgreSQL 15
ORM	Flask-SQLAlchemy
Monitoring	prometheus-client
Containerization	Docker + Docker Compose
CI/CD	GitHub Actions
Hosting	Render (free tier)
HTTP Server	Gunicorn

License

MIT — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
app		app
instance		instance
monitoring		monitoring
tests		tests
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
PROGRESS.md		PROGRESS.md
README.md		README.md
docker-compose.yml		docker-compose.yml
render.yaml		render.yaml
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

AI-Gateway — Autonomous AI Orchestration Platform

Table of Contents

Architecture Overview

Provider Routing Logic

Challenge Coverage

Reliability Features

Retry Mechanism

Timeout Handling

Circuit Breaker

Observability

Prometheus Metrics

Structured Logging

API Documentation

POST /ai/task

GET /health

GET /metrics

GET /history

GET /provider/status

POST /history/cleanup

Dashboard API key helper

How to Run Locally

Option A — Plain Python (Recommended for first-timers)

1. Prerequisites

2. Clone the repository

3. Create and activate a virtual environment

4. Install dependencies

5. Create your .env file

6. Start the server

7. Open the frontend

8. Quick smoke-test

Option B — Docker Compose (mirrors production)

1. Prerequisites

2. Clone and configure

3. Start everything

4. Open the frontend

5. Open observability tools

6. Stop the service

Troubleshooting

How to Run Tests

Deployment Process

One-Click Deploy to Render

CI/CD Pipeline (GitHub Actions)

Environment Variables

Security & Rate Limiting

API Key Authentication

Rate Limiting

Horizontal Scaling

Tech Stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

5. Create your `.env` file

Packages