Skip to content

Commit c2ec0c8

Browse files
committed
readme
1 parent 0388ce2 commit c2ec0c8

File tree

1 file changed

+392
-0
lines changed

1 file changed

+392
-0
lines changed

eval_protocol/proxy/README.md

Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
# LiteLLM Metadata Extraction Gateway
2+
3+
A FastAPI-based metadata extraction gateway that sits in front of LiteLLM to inject evaluation metadata into LLM requests and track completions for distributed evaluation workflows.
4+
5+
## Overview
6+
7+
The Metadata Gateway is a proxy service that enhances LiteLLM by:
8+
- **Extracting metadata from URL paths** and injecting it as Langfuse tags
9+
- **Managing Langfuse credentials** per-project without exposing them to clients
10+
- **Tracking completion insertion IDs** in Redis for completeness verification
11+
- **Fetching and validating traces** from Langfuse with built-in retry logic
12+
13+
This enables distributed evaluation systems to track which LLM completions belong to which evaluation runs, ensuring data completeness and proper attribution.
14+
15+
## Architecture
16+
17+
```
18+
┌─────────────┐
19+
│ Client │
20+
│ (SDK/CLI) │
21+
└──────┬──────┘
22+
│ Authorization: Bearer <api_key>
23+
│ POST /rollout_id/{id}/invocation_id/{id}/.../chat/completions
24+
25+
┌─────────────────────────┐
26+
│ Metadata Gateway │
27+
│ (FastAPI Service) │
28+
│ - Extract metadata │
29+
│ - Inject Langfuse keys │
30+
│ - Generate UUID7 IDs │
31+
└──────┬──────────┬───────┘
32+
│ │
33+
▼ ▼
34+
┌────────┐ ┌─────────────┐
35+
│ Redis │ │ LiteLLM │
36+
│ │ │ Backend │
37+
│ Track │ │ │
38+
│ IDs │ └──────┬──────┘
39+
└────────┘ │
40+
41+
┌─────────────┐
42+
│ Langfuse │
43+
│ (Tracing) │
44+
└─────────────┘
45+
```
46+
47+
### Components
48+
49+
#### 1. **Metadata Gateway** (`proxy_core/`)
50+
- **`app.py`**: Main FastAPI application with route definitions
51+
- **`litellm.py`**: LiteLLM client for forwarding requests
52+
- **`langfuse.py`**: Langfuse trace fetching with retry logic
53+
- **`redis_utils.py`**: Redis operations for insertion ID tracking
54+
- **`models.py`**: Pydantic models for configuration and responses
55+
- **`auth.py`**: Authentication provider interface (extensible)
56+
- **`main.py`**: Entry point for running the service
57+
58+
#### 2. **Redis**
59+
- Stores insertion IDs per rollout for completeness checking
60+
- Uses Redis Sets: `rollout_id -> {insertion_id_1, insertion_id_2, ...}`
61+
62+
#### 3. **LiteLLM Backend**
63+
- Standard LiteLLM proxy for routing to LLM providers
64+
- Configured with Langfuse callbacks for automatic tracing
65+
66+
## Key Features
67+
68+
### Metadata Injection
69+
URL paths encode evaluation metadata that gets injected as Langfuse tags:
70+
- `rollout_id`: Unique ID for a batch evaluation run
71+
- `invocation_id`: ID for a single invocation within a rollout
72+
- `experiment_id`: Experiment identifier
73+
- `run_id`: Run identifier within an experiment
74+
- `row_id`: Dataset row identifier
75+
- `insertion_id`: Auto-generated UUID7 for this specific completion
76+
77+
### Completeness Tracking
78+
1. **On chat completion**: Generate UUID7 insertion_id and store in Redis
79+
2. **On trace fetch**: Verify all expected insertion_ids are present in Langfuse
80+
3. **Retry logic**: Automatic retries with exponential backoff for incomplete traces
81+
82+
### Multi-Project Support
83+
- Store Langfuse credentials for multiple projects in `secrets.json`
84+
- Route requests to the correct project via `project_id` in URL or use default
85+
- Credentials never exposed to clients
86+
87+
## Setup
88+
89+
### Prerequisites
90+
- Docker and Docker Compose (recommended)
91+
- Python 3.11+ (for local development)
92+
93+
### Local Development: Docker Compose
94+
95+
1. **Create secrets file:**
96+
```bash
97+
cp proxy_core/secrets.json.example proxy_core/secrets.json
98+
```
99+
100+
2. **Edit `proxy_core/secrets.json`** with your Langfuse credentials.
101+
**Important**: where we have "my-project", you would use the ID of your Langfuse project, similar to format `cmg00asdf0123...`.
102+
```json
103+
{
104+
"langfuse_keys": {
105+
"my-project": {
106+
"public_key": "pk-lf-...",
107+
"secret_key": "sk-lf-..."
108+
}
109+
},
110+
"default_project_id": "my-project"
111+
}
112+
```
113+
114+
3. **Start services:**
115+
```bash
116+
docker-compose up -d
117+
```
118+
119+
4. **Verify services are running:**
120+
```bash
121+
curl http://localhost:4000/health
122+
# Expected: {"status":"healthy","service":"metadata-proxy"}
123+
```
124+
125+
The gateway will be available at `http://localhost:4000`.
126+
127+
## API Reference
128+
129+
### Chat Completions
130+
131+
#### With Full Metadata
132+
```
133+
POST /rollout_id/{rollout_id}/invocation_id/{invocation_id}/experiment_id/{experiment_id}/run_id/{run_id}/row_id/{row_id}/chat/completions
134+
POST /project_id/{project_id}/rollout_id/{rollout_id}/.../chat/completions
135+
```
136+
137+
**Features:**
138+
- Extracts metadata from URL path
139+
- Generates UUID7 insertion_id
140+
- Injects Langfuse credentials
141+
- Tracks insertion_id in Redis
142+
- Forwards to LiteLLM
143+
144+
**Request:**
145+
```bash
146+
curl -X POST http://localhost:4000/rollout_id/abc123/invocation_id/inv1/experiment_id/exp1/run_id/run1/row_id/row1/chat/completions \
147+
-H "Content-Type: application/json" \
148+
-H "Authorization: Bearer sk-..." \
149+
-d '{
150+
"model": "fireworks_ai/accounts/fireworks/models/llama-v3p3-70b-instruct",
151+
"messages": [{"role": "user", "content": "Hello!"}]
152+
}'
153+
```
154+
155+
**Response:** Standard OpenAI chat completion response
156+
157+
#### With Project Only
158+
```
159+
POST /project_id/{project_id}/chat/completions
160+
```
161+
162+
For completions that don't need rollout tracking.
163+
164+
#### With Encoded Base URL
165+
```
166+
POST /rollout_id/{rollout_id}/.../encoded_base_url/{encoded_base_url}/chat/completions
167+
```
168+
169+
The `encoded_base_url` is base64-encoded URL string injected into the request body as `base_url`.
170+
171+
### Trace Fetching
172+
173+
#### Fetch Langfuse Traces
174+
```
175+
GET /traces?tags=rollout_id:abc123
176+
GET /project_id/{project_id}/traces?tags=rollout_id:abc123
177+
```
178+
179+
**Required Query Parameters:**
180+
- `tags`: Array of tags (must include at least one `rollout_id:*` tag)
181+
182+
**Optional Query Parameters:**
183+
- `limit`: Max traces to fetch (default: 100)
184+
- `sample_size`: Random sample size if more traces found
185+
- `user_id`, `session_id`, `name`, `environment`, `version`, `release`: Langfuse filters
186+
- `fields`: Comma-separated fields to include
187+
- `hours_back`: Fetch traces from last N hours
188+
- `from_timestamp`, `to_timestamp`: ISO datetime strings for time range
189+
- `sleep_between_gets`: Delay between trace.get calls (default: 2.5s)
190+
- `max_retries`: Retry attempts for incomplete traces (default: 3)
191+
192+
**Completeness Logic:**
193+
1. Fetches traces from Langfuse matching tags
194+
2. Extracts insertion_ids from trace tags
195+
3. Compares with expected insertion_ids in Redis
196+
4. Retries with exponential backoff if incomplete
197+
5. Returns 404 if still incomplete after max_retries
198+
199+
**Response:**
200+
```json
201+
{
202+
"project_id": "my-project",
203+
"total_traces": 42,
204+
"traces": [
205+
{
206+
"id": "trace-123",
207+
"name": "chat-completion",
208+
"tags": ["rollout_id:abc123", "insertion_id:uuid7..."],
209+
"input": {...},
210+
"output": {...},
211+
"observations": [...]
212+
}
213+
]
214+
}
215+
```
216+
217+
### Health Check
218+
```
219+
GET /health
220+
```
221+
222+
Returns service health status.
223+
224+
### Catch-All Proxy
225+
```
226+
ANY /{path}
227+
```
228+
229+
Forwards any other request to LiteLLM backend with API key injection.
230+
231+
## Configuration
232+
233+
### Environment Variables
234+
235+
| Variable | Required | Default | Description |
236+
|----------|----------|---------|-------------|
237+
| `LITELLM_URL` | Yes | - | URL of LiteLLM backend |
238+
| `REDIS_HOST` | Yes | - | Redis hostname |
239+
| `REDIS_PORT` | No | 6379 | Redis port |
240+
| `REDIS_PASSWORD` | No | - | Redis password |
241+
| `SECRETS_PATH` | No | `proxy_core/secrets.json` | Path to secrets file |
242+
| `REQUEST_TIMEOUT` | No | 300.0 | Request timeout in seconds |
243+
| `LOG_LEVEL` | No | INFO | Logging level |
244+
| `PORT` | No | 4000 | Gateway port |
245+
246+
### Secrets Configuration
247+
248+
Create `proxy_core/secrets.json`:
249+
```json
250+
{
251+
"langfuse_keys": {
252+
"project-1": {
253+
"public_key": "pk-lf-...",
254+
"secret_key": "sk-lf-..."
255+
},
256+
"project-2": {
257+
"public_key": "pk-lf-...",
258+
"secret_key": "sk-lf-..."
259+
}
260+
},
261+
"default_project_id": "project-1"
262+
}
263+
```
264+
265+
**Security:** Add `secrets.json` to `.gitignore` (already configured).
266+
267+
### LiteLLM Configuration
268+
269+
The `config_no_cache.yaml` configures LiteLLM:
270+
```yaml
271+
model_list:
272+
- model_name: "*"
273+
litellm_params:
274+
model: "*"
275+
litellm_settings:
276+
success_callback: ["langfuse"]
277+
failure_callback: ["langfuse"]
278+
drop_params: True
279+
general_settings:
280+
allow_client_side_credentials: true
281+
```
282+
283+
Key settings:
284+
- **Wildcard model support**: Route any model to any provider
285+
- **Langfuse callbacks**: Automatic tracing on success/failure
286+
- **Client-side credentials**: Accept API keys from request body
287+
288+
## Security Considerations
289+
290+
### Authentication
291+
- **Default**: No authentication (`NoAuthProvider`)
292+
- **Extensible**: Implement custom `AuthProvider` for production
293+
- **API Keys**: Client API keys forwarded to LiteLLM, never stored
294+
295+
### Trace Fetching Security
296+
- **Required rollout_id tag**: Prevents fetching all traces
297+
- **Project isolation**: Projects can only access their own Langfuse data
298+
- **Optional auth**: `/traces` endpoint can require authentication
299+
300+
### Best Practices
301+
1. **Never commit `secrets.json`** - use environment variables in production
302+
2. **Use HTTPS** in production deployments
303+
3. **Implement proper authentication** for production use
304+
4. **Rotate Langfuse keys** regularly
305+
5. **Monitor Redis memory** usage for large rollouts
306+
307+
## Deployment
308+
309+
### Docker Compose (Development)
310+
```bash
311+
docker-compose up -d
312+
```
313+
314+
### Kubernetes
315+
Create deployment with:
316+
- Secrets for `secrets.json` and Redis credentials
317+
- Service for internal/external access
318+
- ConfigMap for LiteLLM config
319+
- Redis StatefulSet or managed Redis service
320+
321+
## Development
322+
323+
### Project Structure
324+
```
325+
eval_protocol/proxy/
326+
├── proxy_core/ # Main application package
327+
│ ├── __init__.py
328+
│ ├── app.py # FastAPI routes
329+
│ ├── litellm.py # LiteLLM client
330+
│ ├── langfuse.py # Langfuse integration
331+
│ ├── redis_utils.py # Redis operations
332+
│ ├── models.py # Pydantic models
333+
│ ├── auth.py # Authentication
334+
│ ├── main.py # Entry point
335+
│ └── secrets.json.example
336+
├── docker-compose.yml # Local development stack
337+
├── Dockerfile.gateway # Gateway container
338+
├── config_no_cache.yaml # LiteLLM config
339+
├── requirements.txt # Python dependencies
340+
└── README.md # This file
341+
```
342+
343+
### Adding Custom Authentication
344+
345+
Extend `AuthProvider` in `auth.py`:
346+
```python
347+
from .auth import AuthProvider
348+
from fastapi import HTTPException
349+
350+
class MyAuthProvider(AuthProvider):
351+
def validate(self, api_key: Optional[str]) -> Optional[str]:
352+
if not api_key or not self.is_valid(api_key):
353+
raise HTTPException(status_code=401, detail="Invalid API key")
354+
return api_key
355+
356+
def is_valid(self, api_key: str) -> bool:
357+
# Your validation logic
358+
return True
359+
```
360+
361+
Then pass it to `create_app`:
362+
```python
363+
from proxy_core import create_app
364+
from my_auth import MyAuthProvider
365+
366+
app = create_app(auth_provider=MyAuthProvider())
367+
```
368+
369+
### Testing
370+
371+
#### Test chat completion:
372+
```bash
373+
curl -X POST http://localhost:4000/rollout_id/test123/invocation_id/inv1/experiment_id/exp1/run_id/run1/row_id/row1/chat/completions \
374+
-H "Content-Type: application/json" \
375+
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
376+
-d '{
377+
"model": "fireworks_ai/accounts/fireworks/models/llama-v3p3-70b-instruct",
378+
"messages": [{"role": "user", "content": "Say hello"}]
379+
}'
380+
```
381+
382+
#### Test trace fetching:
383+
```bash
384+
curl "http://localhost:4000/traces?tags=rollout_id:test123" \
385+
-H "Authorization: Bearer your-auth-token"
386+
```
387+
388+
#### Check Redis:
389+
```bash
390+
redis-cli
391+
> SMEMBERS test123 # View insertion_ids for rollout
392+
```

0 commit comments

Comments
 (0)