-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
Nick edited this page Nov 27, 2025
·
3 revisions
ReliAPI is a minimal reliability layer that sits between clients and upstream APIs (HTTP or LLM providers).
┌─────────┐
│ Client │
└────┬────┘
│ HTTP Request
↓
┌─────────────────────────────────────────┐
│ ReliAPI Gateway │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Request Routing │ │
│ │ (HTTP / LLM) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Client Profile Detection │ │
│ │ (X-Client header / tenant) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Provider Key Pool Selection │ │
│ │ (health-based key selection) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Rate Scheduler │ │
│ │ (token bucket per key/tenant) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Idempotency Check │ │
│ │ (coalesce concurrent requests) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Cache Check │ │
│ │ (GET/HEAD, LLM responses) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Circuit Breaker │ │
│ │ (per-target failure detection)│ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Budget Control (LLM only) │ │
│ │ (cost estimation, caps) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Retry Logic │ │
│ │ (Retry-After, key fallback) │ │
│ └───────────┬──────────────────────┘ │
└──────────────┼──────────────────────────┘
│ Upstream Request
↓
┌──────────┐
│ Target │
│ (HTTP/ │
│ LLM) │
└────┬─────┘
│ Response
↓
┌─────────────────────────────────────────┐
│ ReliAPI Gateway │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Response Normalization │ │
│ │ (unified error format) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Cache Store │ │
│ │ (if cacheable) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Idempotency Store │ │
│ │ (if idempotency_key present) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Metrics Export │ │
│ │ (Prometheus) │ │
│ └───────────┬──────────────────────┘ │
└──────────────┼──────────────────────────┘
│ Response Envelope
↓
┌─────────┐
│ Client │
└─────────┘
For generic HTTP API requests (POST /proxy/http):
- Request Parsing: Extract target, method, path, headers, query, body
-
Target Resolution: Load target config from
config.yaml -
Idempotency Check: Check if request with same
idempotency_keyexists - Cache Check: For GET/HEAD, check cache
- Circuit Breaker: Check if target circuit is open
- HTTP Client: Create HTTP client with retry logic
- Upstream Request: Make request to upstream API
- Response Normalization: Convert to unified response format
- Cache Store: Store response if cacheable
-
Idempotency Store: Store result if
idempotency_keypresent - Metrics: Export Prometheus metrics
For LLM API requests (POST /proxy/llm):
- Request Parsing: Extract target, messages, model, parameters
- Target Resolution: Load target config with LLM settings
-
Streaming Check: If
stream: true, use SSE streaming path - Budget Control: Estimate cost, check hard/soft caps
-
Idempotency Check: Check if request with same
idempotency_keyexists - Cache Check: Check cache for LLM response
- Circuit Breaker: Check if target circuit is open
- Adapter Selection: Select provider adapter (OpenAI, Anthropic, Mistral)
- Request Preparation: Convert generic request to provider-specific format
- Upstream Request: Make request to LLM provider
- Response Parsing: Parse provider response to normalized format
- Cost Calculation: Calculate actual cost
- Response Normalization: Convert to unified response format
- Cache Store: Store response if cacheable
-
Idempotency Store: Store result if
idempotency_keypresent - Metrics: Export Prometheus metrics
- Error Classification: 429 (rate limit), 5xx (server error), network errors
- Backoff Strategy: Exponential backoff with jitter
-
Configurable: Per-target retry matrix in
config.yaml
- Per-Target: Each target has its own circuit breaker
- Failure Threshold: Opens after N consecutive failures
- Cooldown Period: Stays open for configured duration
- Half-Open State: Allows test requests after cooldown
- HTTP: GET/HEAD requests cached by default
- LLM: POST requests cached if enabled
- TTL-Based: Configurable TTL per target
- Redis-Backed: Uses Redis for storage
-
Key-Based: Uses
Idempotency-Keyheader oridempotency_keyfield - Coalescing: Concurrent requests with same key execute once
- Conflict Detection: Different request bodies with same key return error
- TTL-Bound: Results cached for configured TTL
- Cost Estimation: Pre-call cost estimation based on model, messages, max_tokens
- Hard Cap: Rejects requests exceeding hard cap
-
Soft Cap: Throttles by reducing
max_tokensif soft cap exceeded - Cost Tracking: Records actual cost in metrics
- Provider-Specific: Pricing tables per provider/model
- Token-Based: Estimates based on input/output tokens
- Approximate: Uses approximate token counts (not exact)
- Multi-Key Support: Manage multiple API keys per provider
- Health Tracking: Track key health with error scores and status
- Automatic Selection: Select best key based on load and health
- Status Management: Keys transition between active, degraded, exhausted, banned
- Metrics: Export metrics per provider key (requests, errors, QPS, status)
- Token Bucket Algorithm: Per-key, per-tenant, per-client-profile rate limiting
- Burst Protection: Configurable burst size for smoothing traffic spikes
- Concurrent Limiting: Semaphore-based concurrent request limiting
- Normalized 429: Returns stable 429 errors from ReliAPI with retry_after_s
- Profile Detection: Priority: X-Client header → tenant.profile → default
- Per-Profile Limits: Different rate limits per client type
- Configurable: max_parallel_requests, max_qps_per_tenant, max_qps_per_provider_key
- Use Cases: IDE clients (Cursor), API clients, different usage patterns
1. Client → POST /proxy/http
2. Parse request (target, method, path, ...)
3. Load target config
4. Check idempotency (if key present)
5. Check cache (if GET/HEAD)
6. Check circuit breaker
7. Create HTTP client
8. Make upstream request (with retries)
9. Normalize response
10. Store cache (if cacheable)
11. Store idempotency result (if key present)
12. Export metrics
13. Return response envelope
1. Client → POST /proxy/llm
2. Parse request (target, messages, model, ...)
3. Load target config (with LLM settings)
4. Handle streaming (if stream: true, use SSE path)
5. Estimate cost
6. Check hard cap (reject if exceeded)
7. Check soft cap (throttle if exceeded)
8. Check idempotency (if key present)
9. Check cache
10. Check circuit breaker
11. Select adapter (OpenAI/Anthropic/Mistral)
12. Prepare provider-specific request
13. Make upstream request (with retries)
14. Parse provider response
15. Calculate actual cost
16. Normalize response
17. Store cache (if cacheable)
18. Store idempotency result (if key present)
19. Export metrics
20. Return response envelope
ReliAPI uses Redis for:
- Cache: TTL-based cache storage
- Idempotency: Idempotency key storage and result caching
- Circuit Breaker: Failure state storage (optional, can be in-memory)
-
YAML File:
config.yamlwith target definitions - Environment Variables: API keys, Redis URL, config path
-
reliapi_http_requests_total: HTTP request counts -
reliapi_llm_requests_total: LLM request counts -
reliapi_errors_total: Error counts by type -
reliapi_cache_hits_total: Cache hit counts -
reliapi_latency_ms: Request latency histogram -
reliapi_llm_cost_usd: LLM cost histogram
- Structured JSON: All logs in JSON format
- Request IDs: Every request has unique ID
- Trace IDs: Optional trace ID for distributed tracing
-
/healthz: Basic health check -
/readyz: Readiness check (Redis, targets) -
/livez: Liveness check -
/metrics: Prometheus metrics endpoint
┌─────────────┐
│ Client │
└──────┬──────┘
│
┌──────▼──────────────────┐
│ ReliAPI (Port 8000) │
└──────┬──────────────────┘
│
┌──────▼──────┐
│ Redis │
└─────────────┘
┌─────────────┐
│ Client │
└──────┬──────┘
│
┌──────▼──────────────────┐
│ Docker Compose │
│ │
│ ┌──────────────┐ │
│ │ ReliAPI │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Redis │ │
│ └─────────────┘ │
└───────────────────────┘
- Configuration — Configuration guide
- Reliability Features — Detailed feature explanations