|
| 1 | +# LiteLLM Metadata Extraction Gateway |
| 2 | + |
| 3 | +A FastAPI-based metadata extraction gateway that sits in front of LiteLLM to inject evaluation metadata into LLM requests and track completions for distributed evaluation workflows. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The Metadata Gateway is a proxy service that enhances LiteLLM by: |
| 8 | +- **Extracting metadata from URL paths** and injecting it as Langfuse tags |
| 9 | +- **Managing Langfuse credentials** per-project without exposing them to clients |
| 10 | +- **Tracking completion insertion IDs** in Redis for completeness verification |
| 11 | +- **Fetching and validating traces** from Langfuse with built-in retry logic |
| 12 | + |
| 13 | +This enables distributed evaluation systems to track which LLM completions belong to which evaluation runs, ensuring data completeness and proper attribution. |
| 14 | + |
| 15 | +## Architecture |
| 16 | + |
| 17 | +``` |
| 18 | +┌─────────────┐ |
| 19 | +│ Client │ |
| 20 | +│ (SDK/CLI) │ |
| 21 | +└──────┬──────┘ |
| 22 | + │ Authorization: Bearer <api_key> |
| 23 | + │ POST /rollout_id/{id}/invocation_id/{id}/.../chat/completions |
| 24 | + ▼ |
| 25 | +┌─────────────────────────┐ |
| 26 | +│ Metadata Gateway │ |
| 27 | +│ (FastAPI Service) │ |
| 28 | +│ - Extract metadata │ |
| 29 | +│ - Inject Langfuse keys │ |
| 30 | +│ - Generate UUID7 IDs │ |
| 31 | +└──────┬──────────┬───────┘ |
| 32 | + │ │ |
| 33 | + ▼ ▼ |
| 34 | + ┌────────┐ ┌─────────────┐ |
| 35 | + │ Redis │ │ LiteLLM │ |
| 36 | + │ │ │ Backend │ |
| 37 | + │ Track │ │ │ |
| 38 | + │ IDs │ └──────┬──────┘ |
| 39 | + └────────┘ │ |
| 40 | + ▼ |
| 41 | + ┌─────────────┐ |
| 42 | + │ Langfuse │ |
| 43 | + │ (Tracing) │ |
| 44 | + └─────────────┘ |
| 45 | +``` |
| 46 | + |
| 47 | +### Components |
| 48 | + |
| 49 | +#### 1. **Metadata Gateway** (`proxy_core/`) |
| 50 | + - **`app.py`**: Main FastAPI application with route definitions |
| 51 | + - **`litellm.py`**: LiteLLM client for forwarding requests |
| 52 | + - **`langfuse.py`**: Langfuse trace fetching with retry logic |
| 53 | + - **`redis_utils.py`**: Redis operations for insertion ID tracking |
| 54 | + - **`models.py`**: Pydantic models for configuration and responses |
| 55 | + - **`auth.py`**: Authentication provider interface (extensible) |
| 56 | + - **`main.py`**: Entry point for running the service |
| 57 | + |
| 58 | +#### 2. **Redis** |
| 59 | + - Stores insertion IDs per rollout for completeness checking |
| 60 | + - Uses Redis Sets: `rollout_id -> {insertion_id_1, insertion_id_2, ...}` |
| 61 | + |
| 62 | +#### 3. **LiteLLM Backend** |
| 63 | + - Standard LiteLLM proxy for routing to LLM providers |
| 64 | + - Configured with Langfuse callbacks for automatic tracing |
| 65 | + |
| 66 | +## Key Features |
| 67 | + |
| 68 | +### Metadata Injection |
| 69 | +URL paths encode evaluation metadata that gets injected as Langfuse tags: |
| 70 | +- `rollout_id`: Unique ID for a batch evaluation run |
| 71 | +- `invocation_id`: ID for a single invocation within a rollout |
| 72 | +- `experiment_id`: Experiment identifier |
| 73 | +- `run_id`: Run identifier within an experiment |
| 74 | +- `row_id`: Dataset row identifier |
| 75 | +- `insertion_id`: Auto-generated UUID7 for this specific completion |
| 76 | + |
| 77 | +### Completeness Tracking |
| 78 | +1. **On chat completion**: Generate UUID7 insertion_id and store in Redis |
| 79 | +2. **On trace fetch**: Verify all expected insertion_ids are present in Langfuse |
| 80 | +3. **Retry logic**: Automatic retries with exponential backoff for incomplete traces |
| 81 | + |
| 82 | +### Multi-Project Support |
| 83 | +- Store Langfuse credentials for multiple projects in `secrets.json` |
| 84 | +- Route requests to the correct project via `project_id` in URL or use default |
| 85 | +- Credentials never exposed to clients |
| 86 | + |
| 87 | +## Setup |
| 88 | + |
| 89 | +### Prerequisites |
| 90 | +- Docker and Docker Compose (recommended) |
| 91 | +- Python 3.11+ (for local development) |
| 92 | + |
| 93 | +### Local Development: Docker Compose |
| 94 | + |
| 95 | +1. **Create secrets file:** |
| 96 | + ```bash |
| 97 | + cp proxy_core/secrets.json.example proxy_core/secrets.json |
| 98 | + ``` |
| 99 | + |
| 100 | +2. **Edit `proxy_core/secrets.json`** with your Langfuse credentials. |
| 101 | +**Important**: where we have "my-project", you would use the ID of your Langfuse project, similar to format `cmg00asdf0123...`. |
| 102 | + ```json |
| 103 | + { |
| 104 | + "langfuse_keys": { |
| 105 | + "my-project": { |
| 106 | + "public_key": "pk-lf-...", |
| 107 | + "secret_key": "sk-lf-..." |
| 108 | + } |
| 109 | + }, |
| 110 | + "default_project_id": "my-project" |
| 111 | + } |
| 112 | + ``` |
| 113 | + |
| 114 | +3. **Start services:** |
| 115 | + ```bash |
| 116 | + docker-compose up -d |
| 117 | + ``` |
| 118 | + |
| 119 | +4. **Verify services are running:** |
| 120 | + ```bash |
| 121 | + curl http://localhost:4000/health |
| 122 | + # Expected: {"status":"healthy","service":"metadata-proxy"} |
| 123 | + ``` |
| 124 | + |
| 125 | +The gateway will be available at `http://localhost:4000`. |
| 126 | + |
| 127 | +## API Reference |
| 128 | + |
| 129 | +### Chat Completions |
| 130 | + |
| 131 | +#### With Full Metadata |
| 132 | +``` |
| 133 | +POST /rollout_id/{rollout_id}/invocation_id/{invocation_id}/experiment_id/{experiment_id}/run_id/{run_id}/row_id/{row_id}/chat/completions |
| 134 | +POST /project_id/{project_id}/rollout_id/{rollout_id}/.../chat/completions |
| 135 | +``` |
| 136 | + |
| 137 | +**Features:** |
| 138 | +- Extracts metadata from URL path |
| 139 | +- Generates UUID7 insertion_id |
| 140 | +- Injects Langfuse credentials |
| 141 | +- Tracks insertion_id in Redis |
| 142 | +- Forwards to LiteLLM |
| 143 | + |
| 144 | +**Request:** |
| 145 | +```bash |
| 146 | +curl -X POST http://localhost:4000/rollout_id/abc123/invocation_id/inv1/experiment_id/exp1/run_id/run1/row_id/row1/chat/completions \ |
| 147 | + -H "Content-Type: application/json" \ |
| 148 | + -H "Authorization: Bearer sk-..." \ |
| 149 | + -d '{ |
| 150 | + "model": "fireworks_ai/accounts/fireworks/models/llama-v3p3-70b-instruct", |
| 151 | + "messages": [{"role": "user", "content": "Hello!"}] |
| 152 | + }' |
| 153 | +``` |
| 154 | + |
| 155 | +**Response:** Standard OpenAI chat completion response |
| 156 | + |
| 157 | +#### With Project Only |
| 158 | +``` |
| 159 | +POST /project_id/{project_id}/chat/completions |
| 160 | +``` |
| 161 | + |
| 162 | +For completions that don't need rollout tracking. |
| 163 | + |
| 164 | +#### With Encoded Base URL |
| 165 | +``` |
| 166 | +POST /rollout_id/{rollout_id}/.../encoded_base_url/{encoded_base_url}/chat/completions |
| 167 | +``` |
| 168 | + |
| 169 | +The `encoded_base_url` is base64-encoded URL string injected into the request body as `base_url`. |
| 170 | + |
| 171 | +### Trace Fetching |
| 172 | + |
| 173 | +#### Fetch Langfuse Traces |
| 174 | +``` |
| 175 | +GET /traces?tags=rollout_id:abc123 |
| 176 | +GET /project_id/{project_id}/traces?tags=rollout_id:abc123 |
| 177 | +``` |
| 178 | + |
| 179 | +**Required Query Parameters:** |
| 180 | +- `tags`: Array of tags (must include at least one `rollout_id:*` tag) |
| 181 | + |
| 182 | +**Optional Query Parameters:** |
| 183 | +- `limit`: Max traces to fetch (default: 100) |
| 184 | +- `sample_size`: Random sample size if more traces found |
| 185 | +- `user_id`, `session_id`, `name`, `environment`, `version`, `release`: Langfuse filters |
| 186 | +- `fields`: Comma-separated fields to include |
| 187 | +- `hours_back`: Fetch traces from last N hours |
| 188 | +- `from_timestamp`, `to_timestamp`: ISO datetime strings for time range |
| 189 | +- `sleep_between_gets`: Delay between trace.get calls (default: 2.5s) |
| 190 | +- `max_retries`: Retry attempts for incomplete traces (default: 3) |
| 191 | + |
| 192 | +**Completeness Logic:** |
| 193 | +1. Fetches traces from Langfuse matching tags |
| 194 | +2. Extracts insertion_ids from trace tags |
| 195 | +3. Compares with expected insertion_ids in Redis |
| 196 | +4. Retries with exponential backoff if incomplete |
| 197 | +5. Returns 404 if still incomplete after max_retries |
| 198 | + |
| 199 | +**Response:** |
| 200 | +```json |
| 201 | +{ |
| 202 | + "project_id": "my-project", |
| 203 | + "total_traces": 42, |
| 204 | + "traces": [ |
| 205 | + { |
| 206 | + "id": "trace-123", |
| 207 | + "name": "chat-completion", |
| 208 | + "tags": ["rollout_id:abc123", "insertion_id:uuid7..."], |
| 209 | + "input": {...}, |
| 210 | + "output": {...}, |
| 211 | + "observations": [...] |
| 212 | + } |
| 213 | + ] |
| 214 | +} |
| 215 | +``` |
| 216 | + |
| 217 | +### Health Check |
| 218 | +``` |
| 219 | +GET /health |
| 220 | +``` |
| 221 | + |
| 222 | +Returns service health status. |
| 223 | + |
| 224 | +### Catch-All Proxy |
| 225 | +``` |
| 226 | +ANY /{path} |
| 227 | +``` |
| 228 | + |
| 229 | +Forwards any other request to LiteLLM backend with API key injection. |
| 230 | + |
| 231 | +## Configuration |
| 232 | + |
| 233 | +### Environment Variables |
| 234 | + |
| 235 | +| Variable | Required | Default | Description | |
| 236 | +|----------|----------|---------|-------------| |
| 237 | +| `LITELLM_URL` | Yes | - | URL of LiteLLM backend | |
| 238 | +| `REDIS_HOST` | Yes | - | Redis hostname | |
| 239 | +| `REDIS_PORT` | No | 6379 | Redis port | |
| 240 | +| `REDIS_PASSWORD` | No | - | Redis password | |
| 241 | +| `SECRETS_PATH` | No | `proxy_core/secrets.json` | Path to secrets file | |
| 242 | +| `REQUEST_TIMEOUT` | No | 300.0 | Request timeout in seconds | |
| 243 | +| `LOG_LEVEL` | No | INFO | Logging level | |
| 244 | +| `PORT` | No | 4000 | Gateway port | |
| 245 | + |
| 246 | +### Secrets Configuration |
| 247 | + |
| 248 | +Create `proxy_core/secrets.json`: |
| 249 | +```json |
| 250 | +{ |
| 251 | + "langfuse_keys": { |
| 252 | + "project-1": { |
| 253 | + "public_key": "pk-lf-...", |
| 254 | + "secret_key": "sk-lf-..." |
| 255 | + }, |
| 256 | + "project-2": { |
| 257 | + "public_key": "pk-lf-...", |
| 258 | + "secret_key": "sk-lf-..." |
| 259 | + } |
| 260 | + }, |
| 261 | + "default_project_id": "project-1" |
| 262 | +} |
| 263 | +``` |
| 264 | + |
| 265 | +**Security:** Add `secrets.json` to `.gitignore` (already configured). |
| 266 | + |
| 267 | +### LiteLLM Configuration |
| 268 | + |
| 269 | +The `config_no_cache.yaml` configures LiteLLM: |
| 270 | +```yaml |
| 271 | +model_list: |
| 272 | + - model_name: "*" |
| 273 | + litellm_params: |
| 274 | + model: "*" |
| 275 | +litellm_settings: |
| 276 | + success_callback: ["langfuse"] |
| 277 | + failure_callback: ["langfuse"] |
| 278 | + drop_params: True |
| 279 | +general_settings: |
| 280 | + allow_client_side_credentials: true |
| 281 | +``` |
| 282 | +
|
| 283 | +Key settings: |
| 284 | +- **Wildcard model support**: Route any model to any provider |
| 285 | +- **Langfuse callbacks**: Automatic tracing on success/failure |
| 286 | +- **Client-side credentials**: Accept API keys from request body |
| 287 | +
|
| 288 | +## Security Considerations |
| 289 | +
|
| 290 | +### Authentication |
| 291 | +- **Default**: No authentication (`NoAuthProvider`) |
| 292 | +- **Extensible**: Implement custom `AuthProvider` for production |
| 293 | +- **API Keys**: Client API keys forwarded to LiteLLM, never stored |
| 294 | + |
| 295 | +### Trace Fetching Security |
| 296 | +- **Required rollout_id tag**: Prevents fetching all traces |
| 297 | +- **Project isolation**: Projects can only access their own Langfuse data |
| 298 | +- **Optional auth**: `/traces` endpoint can require authentication |
| 299 | + |
| 300 | +### Best Practices |
| 301 | +1. **Never commit `secrets.json`** - use environment variables in production |
| 302 | +2. **Use HTTPS** in production deployments |
| 303 | +3. **Implement proper authentication** for production use |
| 304 | +4. **Rotate Langfuse keys** regularly |
| 305 | +5. **Monitor Redis memory** usage for large rollouts |
| 306 | + |
| 307 | +## Deployment |
| 308 | + |
| 309 | +### Docker Compose (Development) |
| 310 | +```bash |
| 311 | +docker-compose up -d |
| 312 | +``` |
| 313 | + |
| 314 | +### Kubernetes |
| 315 | +Create deployment with: |
| 316 | +- Secrets for `secrets.json` and Redis credentials |
| 317 | +- Service for internal/external access |
| 318 | +- ConfigMap for LiteLLM config |
| 319 | +- Redis StatefulSet or managed Redis service |
| 320 | + |
| 321 | +## Development |
| 322 | + |
| 323 | +### Project Structure |
| 324 | +``` |
| 325 | +eval_protocol/proxy/ |
| 326 | +├── proxy_core/ # Main application package |
| 327 | +│ ├── __init__.py |
| 328 | +│ ├── app.py # FastAPI routes |
| 329 | +│ ├── litellm.py # LiteLLM client |
| 330 | +│ ├── langfuse.py # Langfuse integration |
| 331 | +│ ├── redis_utils.py # Redis operations |
| 332 | +│ ├── models.py # Pydantic models |
| 333 | +│ ├── auth.py # Authentication |
| 334 | +│ ├── main.py # Entry point |
| 335 | +│ └── secrets.json.example |
| 336 | +├── docker-compose.yml # Local development stack |
| 337 | +├── Dockerfile.gateway # Gateway container |
| 338 | +├── config_no_cache.yaml # LiteLLM config |
| 339 | +├── requirements.txt # Python dependencies |
| 340 | +└── README.md # This file |
| 341 | +``` |
| 342 | +
|
| 343 | +### Adding Custom Authentication |
| 344 | +
|
| 345 | +Extend `AuthProvider` in `auth.py`: |
| 346 | +```python |
| 347 | +from .auth import AuthProvider |
| 348 | +from fastapi import HTTPException |
| 349 | +
|
| 350 | +class MyAuthProvider(AuthProvider): |
| 351 | + def validate(self, api_key: Optional[str]) -> Optional[str]: |
| 352 | + if not api_key or not self.is_valid(api_key): |
| 353 | + raise HTTPException(status_code=401, detail="Invalid API key") |
| 354 | + return api_key |
| 355 | +
|
| 356 | + def is_valid(self, api_key: str) -> bool: |
| 357 | + # Your validation logic |
| 358 | + return True |
| 359 | +``` |
| 360 | + |
| 361 | +Then pass it to `create_app`: |
| 362 | +```python |
| 363 | +from proxy_core import create_app |
| 364 | +from my_auth import MyAuthProvider |
| 365 | + |
| 366 | +app = create_app(auth_provider=MyAuthProvider()) |
| 367 | +``` |
| 368 | + |
| 369 | +### Testing |
| 370 | + |
| 371 | +#### Test chat completion: |
| 372 | +```bash |
| 373 | +curl -X POST http://localhost:4000/rollout_id/test123/invocation_id/inv1/experiment_id/exp1/run_id/run1/row_id/row1/chat/completions \ |
| 374 | + -H "Content-Type: application/json" \ |
| 375 | + -H "Authorization: Bearer $FIREWORKS_API_KEY" \ |
| 376 | + -d '{ |
| 377 | + "model": "fireworks_ai/accounts/fireworks/models/llama-v3p3-70b-instruct", |
| 378 | + "messages": [{"role": "user", "content": "Say hello"}] |
| 379 | + }' |
| 380 | +``` |
| 381 | + |
| 382 | +#### Test trace fetching: |
| 383 | +```bash |
| 384 | +curl "http://localhost:4000/traces?tags=rollout_id:test123" \ |
| 385 | + -H "Authorization: Bearer your-auth-token" |
| 386 | +``` |
| 387 | + |
| 388 | +#### Check Redis: |
| 389 | +```bash |
| 390 | +redis-cli |
| 391 | +> SMEMBERS test123 # View insertion_ids for rollout |
| 392 | +``` |
0 commit comments