This is a wrapper for the HAWKI framework to support quasi-standard API requests using the OpenAI API format.
To seperate the libraries and their versions from your global pip/python installation, you can use environments that you can tailor for each project.
To create one: python -m venv myfirstproject
To enter it: source myfirstproject/bin/activate
For further questions, see https://www.w3schools.com/python/python_virtualenv.asp
In the .env-file, there are several vars that need to be configured.
ALLOWED_KEYS=ALLOWED_KEYS # These are the proxy keys to use for auth to the API, choose yourself
PRIMARY_API_KEY=PRIMARY_API_KEY # Your Hawki Web UI key — REQUIRED, the application will not start without it
PORT=8080 # adjust as you like, not necessary
HAWKI_API_URL=HAWKI_API_URL # the Hawki Web UI endpoint, defaults to https://hawki2.htwk-leipzig.de/api/ai-reqpython wrapper.pyAfter that, you can access the application at http://localhost:8080.
From within the repository, execute docker build . -t YOUR_IMAGE_NAME. As you need to set the environment variables here, too. Either set it in the .env file before building, as described before, or set them as ENV ALLOWED_KEYS=…. in the Dockerfile, see comments there. A third option is to pass them as environemnt variables when running the container.
The request format follows the OpenAI standard, i.e.:
{
"model":"gpt-4o",
"messages":
[
{"role":"system","content":"You are Auto Router, a large language model from openrouter.\n\nFormatting Rules:\n- Use Markdown **only when semantically appropriate**. Examples: `inline code`, ```code fences```, tables, and lists.\n- In assistant responses, format file names, directory paths, function names, and class names with backticks (`).\n- For math: use \\( and \\) for inline expressions, and \\[ and \\] for display (block) math."},
{"role":"user","content":"Whats up?"}
]
}The Authorization header accepts two types of keys:
-
Proxy key – one of the keys configured in
ALLOWED_KEYSin the.envfile. This is the normal case for trusted clients that share a single upstream Hawki key managed by the wrapper operator. -
Hawki Web UI key – your personal Hawki Web UI key. If you already have direct access to the Hawki instance, you can pass your own key and the wrapper will forward requests under that key without any additional setup in the env file.
Authorization: Bearer <your-proxy-key-or-hawki-web-ui-key>By default the wrapper enforces a global_timeout of 60 seconds per request (configurable in service_config/files/.env).
This timeout is used to work around the rate limit and retry requests until the configured timeout is exceeded.
You can override this on a per-request basis by setting the X-Hawki-Request-Timeout header to the desired timeout in seconds.
X-Hawki-Request-Timeout: 120The wrapper ships with a pre-configured list of models, defined via the MODELS variable in config/models.env.
These are the models that were/are offered by the HAWKI instance and are referred to as the initial models throughout the codebase.
At the time of writing, the default initial model list is:
| Model | Provider |
|---|---|
|
OpenAI |
|
OpenAI |
|
OpenAI |
|
OpenAI |
|
OpenAI |
|
OpenAI |
|
OpenAI |
|
|
|
|
|
|
|
Meta |
|
Meta |
|
DeepSeek |
|
DeepSeek |
|
Mistral |
|
Mistral |
|
Alibaba |
|
Alibaba |
|
Google DeepMind |
|
Google DeepMind |
Not all initial models may be available at any given time. The set of models accessible through the underlying Hawki instance can change — models may be temporarily disabled, rate-limited, or removed by the Hawki operator (ITSZ) without notice.
During startup the wrapper probes all initial models and removes unavailable ones from its active list. This ensures that only working models are served to clients after startup.
|
ℹ️
|
The active model list is updated on every call to To avoid unneccesary requests to the availability status, try your desired model and use the |
A lightweight health endpoint intended for liveness and readiness probes.
Returns a JSON object with:
-
status– always"healthy"when the service is running -
timestamp– current server time in ISO 8601 format -
completion_cache_size– number of entries currently stored in the LRU completion cache -
initial_models– list of all configured models (regardless of their current availability)
Example response:
{
"status": "healthy",
"timestamp": "2026-03-05T12:00:00.000000",
"completion_cache_size": 42,
"initial_models": ["gpt-4o", "gpt-4o-mini"]
}A detailed diagnostic endpoint that actively probes each configured model by sending a real test request. This endpoint is more expensive to call and should not be used for frequent liveness probes.
Each model is probed twice – once without caching and once with caching – to verify both live availability and cache behaviour. The endpoint also updates the internal list of available models based on the probe results.
|
❗
|
A valid |
Per-model usage statistics for the past 24 hours are included in the response when the key has recorded usage.
Returns a JSON object with:
-
status– always"healthy"when the service is running -
timestamp– current server time in ISO 8601 format -
completion_cache_size– number of entries currently stored in the LRU completion cache -
model_check– a map of model names to their diagnostic results, each containing:-
requests– array of two probe results (uncached, then cached), each with:-
started_at– timestamp of the probe request -
runtime_in_ms– round-trip time in milliseconds -
prompt– the health-check prompt that was sent -
response– the model’s response body -
status–"available"or"unavailable" -
cached– whether the response was served from cache
-
-
usage(optional, only when Authorization header is provided) – cumulative usage counts for the past 24 hours, keyed by relative hour offset ("-1"= last hour,"-2"= last 2 hours, …,"-24"= last 24 hours)
-
Example response:
{
"status": "healthy",
"timestamp": "2026-03-05T12:00:00.000000",
"completion_cache_size": 42,
"model_check": {
"gpt-4o": {
"requests": [
{
"started_at": "2026-03-05T11:59:58.000000",
"runtime_in_ms": 1234.5,
"prompt": "Health check test. Response with 'OK' if you are operational.",
"response": "OK",
"status": "available",
"cached": false
},
{
"started_at": "2026-03-05T11:59:59.000000",
"runtime_in_ms": 3.2,
"prompt": "Health check test. Response with 'OK' if you are operational.",
"response": "OK",
"status": "available",
"cached": true
}
],
"usage": {
"-1": 5,
"-2": 12
}
}
}
}If you face long waiting times for responses, that may be due to the GLOBAL_TIMEOUT setting in service_config/files/.env (default: 60 seconds). Increase it as needed — the same applies when responses may take longer due to large prompts. You can also override the timeout per-request using the X-Hawki-Request-Timeout header, as described above.
We are happy to receive your contributions. Please create a pull request or an issue. As this tool is published under the MIT license, feel free to fork it and use it in your own projects.