-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
Nick edited this page Nov 13, 2025
·
1 revision
ReliAPI is a minimal reliability layer that sits between clients and upstream APIs (HTTP or LLM providers).
┌─────────┐
│ Client │
└────┬────┘
│ HTTP Request
↓
┌─────────────────────────────────────────┐
│ ReliAPI Gateway │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Request Routing │ │
│ │ (HTTP / LLM) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Idempotency Check │ │
│ │ (coalesce concurrent requests) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Cache Check │ │
│ │ (GET/HEAD, LLM responses) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Circuit Breaker │ │
│ │ (per-target failure detection)│ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Budget Control (LLM only) │ │
│ │ (cost estimation, caps) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Retry Logic │ │
│ │ (exponential backoff) │ │
│ └───────────┬──────────────────────┘ │
└──────────────┼──────────────────────────┘
│ Upstream Request
↓
┌──────────┐
│ Target │
│ (HTTP/ │
│ LLM) │
└────┬─────┘
│ Response
↓
┌─────────────────────────────────────────┐
│ ReliAPI Gateway │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Response Normalization │ │
│ │ (unified error format) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Cache Store │ │
│ │ (if cacheable) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Idempotency Store │ │
│ │ (if idempotency_key present) │ │
│ └───────────┬──────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────┐ │
│ │ Metrics Export │ │
│ │ (Prometheus) │ │
│ └───────────┬──────────────────────┘ │
└──────────────┼──────────────────────────┘
│ Response Envelope
↓
┌─────────┐
│ Client │
└─────────┘
For generic HTTP API requests (POST /proxy/http):
- Request Parsing: Extract target, method, path, headers, query, body
-
Target Resolution: Load target config from
config.yaml -
Idempotency Check: Check if request with same
idempotency_keyexists - Cache Check: For GET/HEAD, check cache
- Circuit Breaker: Check if target circuit is open
- HTTP Client: Create HTTP client with retry logic
- Upstream Request: Make request to upstream API
- Response Normalization: Convert to unified response format
- Cache Store: Store response if cacheable
-
Idempotency Store: Store result if
idempotency_keypresent - Metrics: Export Prometheus metrics
For LLM API requests (POST /proxy/llm):
- Request Parsing: Extract target, messages, model, parameters
- Target Resolution: Load target config with LLM settings
- Streaming Rejection: Reject streaming requests (not supported yet)
- Budget Control: Estimate cost, check hard/soft caps
-
Idempotency Check: Check if request with same
idempotency_keyexists - Cache Check: Check cache for LLM response
- Circuit Breaker: Check if target circuit is open
- Adapter Selection: Select provider adapter (OpenAI, Anthropic, Mistral)
- Request Preparation: Convert generic request to provider-specific format
- Upstream Request: Make request to LLM provider
- Response Parsing: Parse provider response to normalized format
- Cost Calculation: Calculate actual cost
- Response Normalization: Convert to unified response format
- Cache Store: Store response if cacheable
-
Idempotency Store: Store result if
idempotency_keypresent - Metrics: Export Prometheus metrics
- Error Classification: 429 (rate limit), 5xx (server error), network errors
- Backoff Strategy: Exponential backoff with jitter
-
Configurable: Per-target retry matrix in
config.yaml
- Per-Target: Each target has its own circuit breaker
- Failure Threshold: Opens after N consecutive failures
- Cooldown Period: Stays open for configured duration
- Half-Open State: Allows test requests after cooldown
- HTTP: GET/HEAD requests cached by default
- LLM: POST requests cached if enabled
- TTL-Based: Configurable TTL per target
- Redis-Backed: Uses Redis for storage
-
Key-Based: Uses
Idempotency-Keyheader oridempotency_keyfield - Coalescing: Concurrent requests with same key execute once
- Conflict Detection: Different request bodies with same key return error
- TTL-Bound: Results cached for configured TTL
- Cost Estimation: Pre-call cost estimation based on model, messages, max_tokens
- Hard Cap: Rejects requests exceeding hard cap
-
Soft Cap: Throttles by reducing
max_tokensif soft cap exceeded - Cost Tracking: Records actual cost in metrics
- Provider-Specific: Pricing tables per provider/model
- Token-Based: Estimates based on input/output tokens
- Approximate: Uses approximate token counts (not exact)
1. Client → POST /proxy/http
2. Parse request (target, method, path, ...)
3. Load target config
4. Check idempotency (if key present)
5. Check cache (if GET/HEAD)
6. Check circuit breaker
7. Create HTTP client
8. Make upstream request (with retries)
9. Normalize response
10. Store cache (if cacheable)
11. Store idempotency result (if key present)
12. Export metrics
13. Return response envelope
1. Client → POST /proxy/llm
2. Parse request (target, messages, model, ...)
3. Load target config (with LLM settings)
4. Reject streaming (if requested)
5. Estimate cost
6. Check hard cap (reject if exceeded)
7. Check soft cap (throttle if exceeded)
8. Check idempotency (if key present)
9. Check cache
10. Check circuit breaker
11. Select adapter (OpenAI/Anthropic/Mistral)
12. Prepare provider-specific request
13. Make upstream request (with retries)
14. Parse provider response
15. Calculate actual cost
16. Normalize response
17. Store cache (if cacheable)
18. Store idempotency result (if key present)
19. Export metrics
20. Return response envelope
ReliAPI uses Redis for:
- Cache: TTL-based cache storage
- Idempotency: Idempotency key storage and result caching
- Circuit Breaker: Failure state storage (optional, can be in-memory)
-
YAML File:
config.yamlwith target definitions - Environment Variables: API keys, Redis URL, config path
-
reliapi_http_requests_total: HTTP request counts -
reliapi_llm_requests_total: LLM request counts -
reliapi_errors_total: Error counts by type -
reliapi_cache_hits_total: Cache hit counts -
reliapi_latency_ms: Request latency histogram -
reliapi_llm_cost_usd: LLM cost histogram
- Structured JSON: All logs in JSON format
- Request IDs: Every request has unique ID
- Trace IDs: Optional trace ID for distributed tracing
-
/healthz: Basic health check -
/readyz: Readiness check (Redis, targets) -
/livez: Liveness check -
/metrics: Prometheus metrics endpoint
┌─────────────┐
│ Client │
└──────┬──────┘
│
┌──────▼──────────────────┐
│ ReliAPI (Port 8000) │
└──────┬──────────────────┘
│
┌──────▼──────┐
│ Redis │
└─────────────┘
┌─────────────┐
│ Client │
└──────┬──────┘
│
┌──────▼──────────────────┐
│ Docker Compose │
│ │
│ ┌──────────────┐ │
│ │ ReliAPI │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Redis │ │
│ └─────────────┘ │
└───────────────────────┘
- Configuration — Configuration guide
- Reliability Features — Detailed feature explanations