Architecture

High-Level Overview

ReliAPI is a minimal reliability layer that sits between clients and upstream APIs (HTTP or LLM providers).

┌─────────┐
│ Client  │
└────┬────┘
     │ HTTP Request
     ↓
┌─────────────────────────────────────────┐
│           ReliAPI Gateway               │
│                                         │
│  ┌──────────────────────────────────┐  │
│  │   Request Routing                │  │
│  │   (HTTP / LLM)                   │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Client Profile Detection        │  │
│  │   (X-Client header / tenant)     │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Provider Key Pool Selection     │  │
│  │   (health-based key selection)   │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Rate Scheduler                 │  │
│  │   (token bucket per key/tenant) │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Idempotency Check               │  │
│  │   (coalesce concurrent requests) │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Cache Check                    │  │
│  │   (GET/HEAD, LLM responses)     │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Circuit Breaker               │  │
│  │   (per-target failure detection)│  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Budget Control (LLM only)       │  │
│  │   (cost estimation, caps)        │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Retry Logic                    │  │
│  │   (Retry-After, key fallback)   │  │
│  └───────────┬──────────────────────┘  │
└──────────────┼──────────────────────────┘
               │ Upstream Request
               ↓
         ┌──────────┐
         │  Target  │
         │  (HTTP/  │
         │   LLM)   │
         └────┬─────┘
              │ Response
              ↓
┌─────────────────────────────────────────┐
│           ReliAPI Gateway               │
│                                         │
│  ┌──────────────────────────────────┐  │
│  │   Response Normalization         │  │
│  │   (unified error format)        │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Cache Store                    │  │
│  │   (if cacheable)                 │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Idempotency Store              │  │
│  │   (if idempotency_key present)   │  │
│  └───────────┬──────────────────────┘  │
│              ↓                          │
│  ┌──────────────────────────────────┐  │
│  │   Metrics Export                 │  │
│  │   (Prometheus)                   │  │
│  └───────────┬──────────────────────┘  │
└──────────────┼──────────────────────────┘
               │ Response Envelope
               ↓
         ┌─────────┐
         │ Client  │
         └─────────┘

Components

HTTP Proxy Path

For generic HTTP API requests (POST /proxy/http):

Request Parsing: Extract target, method, path, headers, query, body
Target Resolution: Load target config from config.yaml
Idempotency Check: Check if request with same idempotency_key exists
Cache Check: For GET/HEAD, check cache
Circuit Breaker: Check if target circuit is open
HTTP Client: Create HTTP client with retry logic
Upstream Request: Make request to upstream API
Response Normalization: Convert to unified response format
Cache Store: Store response if cacheable
Idempotency Store: Store result if idempotency_key present
Metrics: Export Prometheus metrics

LLM Proxy Path

For LLM API requests (POST /proxy/llm):

Request Parsing: Extract target, messages, model, parameters
Target Resolution: Load target config with LLM settings
Streaming Check: If stream: true, use SSE streaming path
Budget Control: Estimate cost, check hard/soft caps
Idempotency Check: Check if request with same idempotency_key exists
Cache Check: Check cache for LLM response
Circuit Breaker: Check if target circuit is open
Adapter Selection: Select provider adapter (OpenAI, Anthropic, Mistral)
Request Preparation: Convert generic request to provider-specific format
Upstream Request: Make request to LLM provider
Response Parsing: Parse provider response to normalized format
Cost Calculation: Calculate actual cost
Response Normalization: Convert to unified response format
Cache Store: Store response if cacheable
Idempotency Store: Store result if idempotency_key present
Metrics: Export Prometheus metrics

Core Components

Retry Logic

Error Classification: 429 (rate limit), 5xx (server error), network errors
Backoff Strategy: Exponential backoff with jitter
Configurable: Per-target retry matrix in config.yaml

Circuit Breaker

Per-Target: Each target has its own circuit breaker
Failure Threshold: Opens after N consecutive failures
Cooldown Period: Stays open for configured duration
Half-Open State: Allows test requests after cooldown

Cache

HTTP: GET/HEAD requests cached by default
LLM: POST requests cached if enabled
TTL-Based: Configurable TTL per target
Redis-Backed: Uses Redis for storage

Idempotency

Key-Based: Uses Idempotency-Key header or idempotency_key field
Coalescing: Concurrent requests with same key execute once
Conflict Detection: Different request bodies with same key return error
TTL-Bound: Results cached for configured TTL

Budget Control (LLM Only)

Cost Estimation: Pre-call cost estimation based on model, messages, max_tokens
Hard Cap: Rejects requests exceeding hard cap
Soft Cap: Throttles by reducing max_tokens if soft cap exceeded
Cost Tracking: Records actual cost in metrics

Cost Estimator

Provider-Specific: Pricing tables per provider/model
Token-Based: Estimates based on input/output tokens
Approximate: Uses approximate token counts (not exact)

Provider Key Pool Manager

Multi-Key Support: Manage multiple API keys per provider
Health Tracking: Track key health with error scores and status
Automatic Selection: Select best key based on load and health
Status Management: Keys transition between active, degraded, exhausted, banned
Metrics: Export metrics per provider key (requests, errors, QPS, status)

Rate Scheduler

Token Bucket Algorithm: Per-key, per-tenant, per-client-profile rate limiting
Burst Protection: Configurable burst size for smoothing traffic spikes
Concurrent Limiting: Semaphore-based concurrent request limiting
Normalized 429: Returns stable 429 errors from ReliAPI with retry_after_s

Client Profile Manager

Profile Detection: Priority: X-Client header → tenant.profile → default
Per-Profile Limits: Different rate limits per client type
Configurable: max_parallel_requests, max_qps_per_tenant, max_qps_per_provider_key
Use Cases: IDE clients (Cursor), API clients, different usage patterns

Request Flow

HTTP Request Flow

1. Client → POST /proxy/http
2. Parse request (target, method, path, ...)
3. Load target config
4. Check idempotency (if key present)
5. Check cache (if GET/HEAD)
6. Check circuit breaker
7. Create HTTP client
8. Make upstream request (with retries)
9. Normalize response
10. Store cache (if cacheable)
11. Store idempotency result (if key present)
12. Export metrics
13. Return response envelope

LLM Request Flow

1. Client → POST /proxy/llm
2. Parse request (target, messages, model, ...)
3. Load target config (with LLM settings)
4. Handle streaming (if stream: true, use SSE path)
5. Estimate cost
6. Check hard cap (reject if exceeded)
7. Check soft cap (throttle if exceeded)
8. Check idempotency (if key present)
9. Check cache
10. Check circuit breaker
11. Select adapter (OpenAI/Anthropic/Mistral)
12. Prepare provider-specific request
13. Make upstream request (with retries)
14. Parse provider response
15. Calculate actual cost
16. Normalize response
17. Store cache (if cacheable)
18. Store idempotency result (if key present)
19. Export metrics
20. Return response envelope

Data Storage

Redis

ReliAPI uses Redis for:

Cache: TTL-based cache storage
Idempotency: Idempotency key storage and result caching
Circuit Breaker: Failure state storage (optional, can be in-memory)

Configuration

YAML File: config.yaml with target definitions
Environment Variables: API keys, Redis URL, config path

Metrics & Observability

Prometheus Metrics

reliapi_http_requests_total: HTTP request counts
reliapi_llm_requests_total: LLM request counts
reliapi_errors_total: Error counts by type
reliapi_cache_hits_total: Cache hit counts
reliapi_latency_ms: Request latency histogram
reliapi_llm_cost_usd: LLM cost histogram

Logging

Structured JSON: All logs in JSON format
Request IDs: Every request has unique ID
Trace IDs: Optional trace ID for distributed tracing

Health Endpoints

/healthz: Basic health check
/readyz: Readiness check (Redis, targets)
/livez: Liveness check
/metrics: Prometheus metrics endpoint

Deployment Architecture

Single Instance

┌─────────────┐
│   Client    │
└──────┬──────┘
       │
┌──────▼──────────────────┐
│   ReliAPI (Port 8000)   │
└──────┬──────────────────┘
       │
┌──────▼──────┐
│   Redis     │
└─────────────┘

Docker Compose

┌─────────────┐
│   Client    │
└──────┬──────┘
       │
┌──────▼──────────────────┐
│   Docker Compose        │
│                        │
│  ┌──────────────┐      │
│  │   ReliAPI    │      │
│  └──────┬───────┘      │
│         │              │
│  ┌──────▼──────┐      │
│  │   Redis     │      │
│  └─────────────┘      │
└───────────────────────┘

Next Steps

Configuration — Configuration guide
Reliability Features — Detailed feature explanations

Architecture

Architecture

High-Level Overview

Components

HTTP Proxy Path

LLM Proxy Path

Core Components

Retry Logic

Circuit Breaker

Cache

Idempotency

Budget Control (LLM Only)

Cost Estimator

Provider Key Pool Manager

Rate Scheduler

Client Profile Manager

Request Flow

HTTP Request Flow

LLM Request Flow

Data Storage

Redis

Configuration

Metrics & Observability

Prometheus Metrics

Logging

Health Endpoints

Deployment Architecture

Single Instance

Docker Compose

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally