Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves Hawki wrapper observability and health diagnostics by reducing API key exposure in logs, making model diagnostics use a provided API key, and adding periodic cleanup of stored per-key model usage.
Changes:
- Mask API keys in chat completion request logs.
- Make model diagnostics/health checks accept an explicit API key and wire it through
/health/detailsand startup checks. - Add a background task to periodically remove stale entries from
KEY_MODELS_USAGE.
Comments suppressed due to low confidence (1)
wrapper.py:383
health_check_modelusesprocess_chat_request, which records model usage viaadd_model_usage(...). This means running diagnostics (now potentially for arbitrary user-provided API keys) will inflate usage stats for that key with health-check traffic, making the reported usage misleading. Consider adding a way to disable usage tracking for internal diagnostics or using a separate code path that bypassesadd_model_usage.
response = await process_chat_request(request, {"Authorization": f"Bearer {api_key}"}, use_cache=use_cache)
# process_chat_request returns a FastAPI JSONResponse; extract JSON body and headers
response_body = json.loads(response.body.decode())
result["started_at"] = request_start_time.isoformat()
# Measure runtime based on epoch seconds captured before the request
result["runtime_in_ms"] = (time.time() - start_epoch) * 1000
result["prompt"] = HEALTH_CHECK_PROMPT
result["response"] = response_body
result["status"] = "available" if response.status_code == 200 else "unavailable"
result["cached"] = response.headers.get("X-Cache-Hit") == "true"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Check key | ||
| if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key): |
There was a problem hiding this comment.
The invalid-key path calls is_api_key_working(api_key), which performs an upstream invoke() (even if cached later). Because /health/details is unauthenticated and easy to hit, this can be abused to trigger expensive upstream calls (DoS) and churn the TTL cache. Consider avoiding active validation here (e.g., only allow ALLOWED_KEYS for diagnostics, or return a 401 without testing the key) and/or add rate limiting.
| # Check key | |
| if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key): | |
| # Check key using only locally-known keys to avoid expensive upstream validation | |
| if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE: |
wrapper.py
Outdated
| diagnostics = await run_model_diagnostics(api_key) | ||
| if api_key in KEY_MODELS_USAGE: | ||
| for model in diagnostics["models"]: | ||
| if model in KEY_MODELS_USAGE[api_key]: | ||
| usage_per_hour = KEY_MODELS_USAGE[api_key][model].getUsagePerHour() | ||
| diagnostics["models"][model]["usage"] = { | ||
| str(-(index+1)): value for index, value in enumerate(usage_per_hour) | ||
| } | ||
| result = diagnostics["models"] |
There was a problem hiding this comment.
run_model_diagnostics(api_key) updates global state (hawkiClient.models.set(...)) based on diagnostics executed with the caller-provided API key. If different keys have different permissions/quotas, a single request to /health/details could change the globally advertised/used model list for all users. Consider running diagnostics with a dedicated internal key (e.g., ALLOWED_KEYS[0]) or isolating diagnostics in a separate client instance without mutating global hawkiClient state.
| async def health_check_model(model: str, api_key: str, use_cache: bool = False) -> dict: | ||
| """ |
There was a problem hiding this comment.
Changing health_check_model’s signature to require api_key will break existing unit tests/mocks that patch health_check_model(model, use_cache=False) (e.g., test_health_endpoints.py’s default_health_check). Update the tests/mocks accordingly (and consider keeping a default internal key if you want to preserve the old call pattern).
| stale_keys = [ | ||
| key for key, models in list(KEY_MODELS_USAGE.items()) | ||
| if all(len(usage.getTimestamps()) == 0 for usage in models.values()) | ||
| ] |
There was a problem hiding this comment.
cleanup_stale_model_usage calls usage.getTimestamps() inside all(...), which builds a full list copy each time just to check emptiness. This can be unnecessarily expensive for keys/models with many timestamps. Consider adding a lightweight method (e.g., is_empty() / count()) that runs cleanup and returns len(self._timestamps) without allocating a list.
| # Start background cleanup for KEY_MODELS_USAGE | ||
| asyncio.create_task(cleanup_stale_model_usage()) | ||
| logger.info("KEY_MODELS_USAGE cleanup task started (runs hourly)") |
There was a problem hiding this comment.
The cleanup task is started via asyncio.create_task(...) inside run_startup_checks(), but run_startup_checks() is only invoked in the __main__ path. In typical deployments (e.g., uvicorn wrapper:app), this task (and the diagnostics) won’t run. Consider registering startup/shutdown handlers with FastAPI (lifespan or @app.on_event("startup")) so background tasks are managed consistently across entrypoints.
wrapper.py
Outdated
| client_cache = TTLCache(maxsize=CLIENT_CACHE_MAXSIZE, ttl=CLIENT_CACHE_TTL) | ||
|
|
||
| KEY_MODELS_USAGE = dict() # api_key -> list of ModelUsage instances | ||
| KEY_MODELS_USAGE: dict = dict() # api_key -> dict[model_name -> ModelUsage] |
There was a problem hiding this comment.
KEY_MODELS_USAGE is annotated as a plain dict, but it’s used as a nested mapping (api_key -> model_name -> ModelUsage). Consider adding a precise type like dict[str, dict[str, ModelUsage]] (or defaultdict) to improve readability and static checking, and keep the comment consistent with the actual structure.
| KEY_MODELS_USAGE: dict = dict() # api_key -> dict[model_name -> ModelUsage] | |
| KEY_MODELS_USAGE: dict[str, dict[str, "ModelUsage"]] = {} # api_key -> model_name -> ModelUsage |
| if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key): | ||
| result = "Couldn't verify API key provided in Authorization header. Provide a valid API key to get model usage details." | ||
| else: | ||
| diagnostics = await run_model_diagnostics(api_key) | ||
| if api_key in KEY_MODELS_USAGE: | ||
| for model in diagnostics["models"]: | ||
| if model in KEY_MODELS_USAGE[api_key]: | ||
| usage_per_hour = KEY_MODELS_USAGE[api_key][model].getUsagePerHour() | ||
| diagnostics["models"][model]["usage"] = { | ||
| str(-(index+1)): value for index, value in enumerate(usage_per_hour) | ||
| } | ||
| result = diagnostics["models"] | ||
| else: | ||
| result = "No API key provided in Authorization header. Provide a valid API key to get model usage details." | ||
|
|
||
| return { | ||
| "status": "healthy", | ||
| "timestamp": datetime.now().isoformat(), | ||
| "completion_cache_size": len(completion_cache.cache), | ||
| "model_check": diagnostics["models"], | ||
| "model_check": result, | ||
| } |
There was a problem hiding this comment.
/health/details now returns a string message in model_check when the Authorization header is missing/invalid, whereas other cases return a dict of models. This makes the response schema inconsistent and will break callers/tests that expect model_check to always be an object with .items(). Consider always returning a consistent object shape (e.g., { "error": ..., "models": ... }) and/or use an appropriate HTTP status code for invalid auth.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @asynccontextmanager | ||
| async def lifespan(app: FastAPI): | ||
| """Manage startup and shutdown of background tasks and startup checks.""" | ||
| await run_startup_checks() | ||
| yield |
There was a problem hiding this comment.
lifespan() now awaits run_startup_checks(), which contains an unbounded retry loop (while not await test_hawki_endpoint(): ... sleep(10)). If Hawki is down (or ALLOWED_KEYS[0] is empty/misconfigured), the app will never finish startup (and TestClient-based tests can hang). Consider bounding retries / adding a startup timeout, and/or making startup checks optional via an env flag in test/development environments.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| # Start background cleanup for KEY_MODELS_USAGE | ||
| asyncio.create_task(cleanup_stale_model_usage()) | ||
| logger.info("KEY_MODELS_USAGE cleanup task started (runs hourly)") |
There was a problem hiding this comment.
A background task is started via asyncio.create_task(cleanup_stale_model_usage()) but the task handle isn’t retained and nothing cancels it on shutdown. This can leak tasks (and in tests can produce “Task was destroyed but it is pending” warnings). Prefer creating the task inside lifespan() (store the handle), then cancel/await it in the shutdown portion after yield.
| str(-(index+1)): value for index, value in enumerate(usage_per_hour) | ||
| } | ||
| # Check key | ||
| if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key): |
There was a problem hiding this comment.
health_details calls is_api_key_working(api_key) directly inside an async endpoint. is_api_key_working performs a synchronous test_client.invoke(...) (requests-based) call, which will block the event loop while validating the key. Consider running it in a thread (await asyncio.to_thread(...)) or making the key validation path fully async.
| if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key): | |
| if ( | |
| api_key not in ALLOWED_KEYS | |
| and api_key not in KEY_MODELS_USAGE | |
| and not await asyncio.to_thread(is_api_key_working, api_key) | |
| ): |
| # Provide model usage for api key passed | ||
| if authorization: | ||
| logger.info(f"Authorization header provided for health details") | ||
| api_key = authorization.replace("Bearer ", "") |
There was a problem hiding this comment.
Authorization parsing via authorization.replace("Bearer ", "") can remove occurrences mid-string and doesn’t trim whitespace. Using removeprefix("Bearer ") (and strip()) is safer and avoids accidentally accepting malformed headers.
| api_key = authorization.replace("Bearer ", "") | |
| auth_value = authorization.strip() | |
| api_key = auth_value.removeprefix("Bearer ").strip() |
| client_cache = TTLCache(maxsize=CLIENT_CACHE_MAXSIZE, ttl=CLIENT_CACHE_TTL) | ||
|
|
||
| KEY_MODELS_USAGE = dict() # api_key -> list of ModelUsage instances | ||
| KEY_MODELS_USAGE: dict[str, dict[str, ModelUsage]] = {} |
There was a problem hiding this comment.
KEY_MODELS_USAGE is annotated with ModelUsage before ModelUsage is defined. On Python 3.11 (see Dockerfile), module-level annotations are evaluated eagerly, so this will raise NameError: name 'ModelUsage' is not defined at import time. Move the declaration below the ModelUsage class, add from __future__ import annotations, or quote the type ('ModelUsage').
| KEY_MODELS_USAGE: dict[str, dict[str, ModelUsage]] = {} | |
| KEY_MODELS_USAGE: dict[str, dict[str, "ModelUsage"]] = {} |
No description provided.