Skip to content

Hawki improvements#9

Merged
dschiese merged 4 commits intomainfrom
hawki_improvements
Mar 12, 2026
Merged

Hawki improvements#9
dschiese merged 4 commits intomainfrom
hawki_improvements

Conversation

@dschiese
Copy link
Copy Markdown
Member

@dschiese dschiese commented Mar 9, 2026

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Hawki wrapper observability and health diagnostics by reducing API key exposure in logs, making model diagnostics use a provided API key, and adding periodic cleanup of stored per-key model usage.

Changes:

  • Mask API keys in chat completion request logs.
  • Make model diagnostics/health checks accept an explicit API key and wire it through /health/details and startup checks.
  • Add a background task to periodically remove stale entries from KEY_MODELS_USAGE.
Comments suppressed due to low confidence (1)

wrapper.py:383

  • health_check_model uses process_chat_request, which records model usage via add_model_usage(...). This means running diagnostics (now potentially for arbitrary user-provided API keys) will inflate usage stats for that key with health-check traffic, making the reported usage misleading. Consider adding a way to disable usage tracking for internal diagnostics or using a separate code path that bypasses add_model_usage.
    response = await process_chat_request(request, {"Authorization": f"Bearer {api_key}"}, use_cache=use_cache)
    # process_chat_request returns a FastAPI JSONResponse; extract JSON body and headers
    response_body = json.loads(response.body.decode())
    result["started_at"] = request_start_time.isoformat()
    # Measure runtime based on epoch seconds captured before the request
    result["runtime_in_ms"] = (time.time() - start_epoch) * 1000
    result["prompt"] = HEALTH_CHECK_PROMPT
    result["response"] = response_body
    result["status"] = "available" if response.status_code == 200 else "unavailable"
    result["cached"] = response.headers.get("X-Cache-Hit") == "true"


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +333 to +334
# Check key
if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key):
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The invalid-key path calls is_api_key_working(api_key), which performs an upstream invoke() (even if cached later). Because /health/details is unauthenticated and easy to hit, this can be abused to trigger expensive upstream calls (DoS) and churn the TTL cache. Consider avoiding active validation here (e.g., only allow ALLOWED_KEYS for diagnostics, or return a 401 without testing the key) and/or add rate limiting.

Suggested change
# Check key
if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key):
# Check key using only locally-known keys to avoid expensive upstream validation
if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE:

Copilot uses AI. Check for mistakes.
wrapper.py Outdated
Comment on lines +337 to +345
diagnostics = await run_model_diagnostics(api_key)
if api_key in KEY_MODELS_USAGE:
for model in diagnostics["models"]:
if model in KEY_MODELS_USAGE[api_key]:
usage_per_hour = KEY_MODELS_USAGE[api_key][model].getUsagePerHour()
diagnostics["models"][model]["usage"] = {
str(-(index+1)): value for index, value in enumerate(usage_per_hour)
}
result = diagnostics["models"]
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_model_diagnostics(api_key) updates global state (hawkiClient.models.set(...)) based on diagnostics executed with the caller-provided API key. If different keys have different permissions/quotas, a single request to /health/details could change the globally advertised/used model list for all users. Consider running diagnostics with a dedicated internal key (e.g., ALLOWED_KEYS[0]) or isolating diagnostics in a separate client instance without mutating global hawkiClient state.

Copilot uses AI. Check for mistakes.
Comment on lines +356 to 357
async def health_check_model(model: str, api_key: str, use_cache: bool = False) -> dict:
"""
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing health_check_model’s signature to require api_key will break existing unit tests/mocks that patch health_check_model(model, use_cache=False) (e.g., test_health_endpoints.py’s default_health_check). Update the tests/mocks accordingly (and consider keeping a default internal key if you want to preserve the old call pattern).

Copilot uses AI. Check for mistakes.
Comment on lines +553 to +556
stale_keys = [
key for key, models in list(KEY_MODELS_USAGE.items())
if all(len(usage.getTimestamps()) == 0 for usage in models.values())
]
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup_stale_model_usage calls usage.getTimestamps() inside all(...), which builds a full list copy each time just to check emptiness. This can be unnecessarily expensive for keys/models with many timestamps. Consider adding a lightweight method (e.g., is_empty() / count()) that runs cleanup and returns len(self._timestamps) without allocating a list.

Copilot uses AI. Check for mistakes.
Comment on lines +580 to +582
# Start background cleanup for KEY_MODELS_USAGE
asyncio.create_task(cleanup_stale_model_usage())
logger.info("KEY_MODELS_USAGE cleanup task started (runs hourly)")
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup task is started via asyncio.create_task(...) inside run_startup_checks(), but run_startup_checks() is only invoked in the __main__ path. In typical deployments (e.g., uvicorn wrapper:app), this task (and the diagnostics) won’t run. Consider registering startup/shutdown handlers with FastAPI (lifespan or @app.on_event("startup")) so background tasks are managed consistently across entrypoints.

Copilot uses AI. Check for mistakes.
wrapper.py Outdated
client_cache = TTLCache(maxsize=CLIENT_CACHE_MAXSIZE, ttl=CLIENT_CACHE_TTL)

KEY_MODELS_USAGE = dict() # api_key -> list of ModelUsage instances
KEY_MODELS_USAGE: dict = dict() # api_key -> dict[model_name -> ModelUsage]
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KEY_MODELS_USAGE is annotated as a plain dict, but it’s used as a nested mapping (api_key -> model_name -> ModelUsage). Consider adding a precise type like dict[str, dict[str, ModelUsage]] (or defaultdict) to improve readability and static checking, and keep the comment consistent with the actual structure.

Suggested change
KEY_MODELS_USAGE: dict = dict() # api_key -> dict[model_name -> ModelUsage]
KEY_MODELS_USAGE: dict[str, dict[str, "ModelUsage"]] = {} # api_key -> model_name -> ModelUsage

Copilot uses AI. Check for mistakes.
Comment on lines +334 to 354
if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key):
result = "Couldn't verify API key provided in Authorization header. Provide a valid API key to get model usage details."
else:
diagnostics = await run_model_diagnostics(api_key)
if api_key in KEY_MODELS_USAGE:
for model in diagnostics["models"]:
if model in KEY_MODELS_USAGE[api_key]:
usage_per_hour = KEY_MODELS_USAGE[api_key][model].getUsagePerHour()
diagnostics["models"][model]["usage"] = {
str(-(index+1)): value for index, value in enumerate(usage_per_hour)
}
result = diagnostics["models"]
else:
result = "No API key provided in Authorization header. Provide a valid API key to get model usage details."

return {
"status": "healthy",
"timestamp": datetime.now().isoformat(),
"completion_cache_size": len(completion_cache.cache),
"model_check": diagnostics["models"],
"model_check": result,
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/health/details now returns a string message in model_check when the Authorization header is missing/invalid, whereas other cases return a dict of models. This makes the response schema inconsistent and will break callers/tests that expect model_check to always be an object with .items(). Consider always returning a consistent object shape (e.g., { "error": ..., "models": ... }) and/or use an appropriate HTTP status code for invalid auth.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +37 to +41
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Manage startup and shutdown of background tasks and startup checks."""
await run_startup_checks()
yield
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lifespan() now awaits run_startup_checks(), which contains an unbounded retry loop (while not await test_hawki_endpoint(): ... sleep(10)). If Hawki is down (or ALLOWED_KEYS[0] is empty/misconfigured), the app will never finish startup (and TestClient-based tests can hang). Consider bounding retries / adding a startup timeout, and/or making startup checks optional via an env flag in test/development environments.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines +592 to +594
# Start background cleanup for KEY_MODELS_USAGE
asyncio.create_task(cleanup_stale_model_usage())
logger.info("KEY_MODELS_USAGE cleanup task started (runs hourly)")
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A background task is started via asyncio.create_task(cleanup_stale_model_usage()) but the task handle isn’t retained and nothing cancels it on shutdown. This can leak tasks (and in tests can produce “Task was destroyed but it is pending” warnings). Prefer creating the task inside lifespan() (store the handle), then cancel/await it in the shutdown portion after yield.

Copilot uses AI. Check for mistakes.
str(-(index+1)): value for index, value in enumerate(usage_per_hour)
}
# Check key
if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key):
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

health_details calls is_api_key_working(api_key) directly inside an async endpoint. is_api_key_working performs a synchronous test_client.invoke(...) (requests-based) call, which will block the event loop while validating the key. Consider running it in a thread (await asyncio.to_thread(...)) or making the key validation path fully async.

Suggested change
if api_key not in ALLOWED_KEYS and api_key not in KEY_MODELS_USAGE and not is_api_key_working(api_key):
if (
api_key not in ALLOWED_KEYS
and api_key not in KEY_MODELS_USAGE
and not await asyncio.to_thread(is_api_key_working, api_key)
):

Copilot uses AI. Check for mistakes.
# Provide model usage for api key passed
if authorization:
logger.info(f"Authorization header provided for health details")
api_key = authorization.replace("Bearer ", "")
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authorization parsing via authorization.replace("Bearer ", "") can remove occurrences mid-string and doesn’t trim whitespace. Using removeprefix("Bearer ") (and strip()) is safer and avoids accidentally accepting malformed headers.

Suggested change
api_key = authorization.replace("Bearer ", "")
auth_value = authorization.strip()
api_key = auth_value.removeprefix("Bearer ").strip()

Copilot uses AI. Check for mistakes.
client_cache = TTLCache(maxsize=CLIENT_CACHE_MAXSIZE, ttl=CLIENT_CACHE_TTL)

KEY_MODELS_USAGE = dict() # api_key -> list of ModelUsage instances
KEY_MODELS_USAGE: dict[str, dict[str, ModelUsage]] = {}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KEY_MODELS_USAGE is annotated with ModelUsage before ModelUsage is defined. On Python 3.11 (see Dockerfile), module-level annotations are evaluated eagerly, so this will raise NameError: name 'ModelUsage' is not defined at import time. Move the declaration below the ModelUsage class, add from __future__ import annotations, or quote the type ('ModelUsage').

Suggested change
KEY_MODELS_USAGE: dict[str, dict[str, ModelUsage]] = {}
KEY_MODELS_USAGE: dict[str, dict[str, "ModelUsage"]] = {}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI commented Mar 12, 2026

@dschiese I've opened a new pull request, #10, to work on those changes. Once the pull request is ready, I'll request review from you.

@dschiese dschiese merged commit 16f2d20 into main Mar 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants