Hawki API Wrapper (beta)

This is a wrapper for the HAWKI framework to support quasi-standard API requests using the OpenAI API format.

Table of Contents

Building and Running the Application
Models
- Initial model list
- Model availability
API Endpoints
Contribute
Disclaimer

Building and Running the Application

Running locally with Python 3.10+

Install dependencies

Create a venv environment and run it

To seperate the libraries and their versions from your global pip/python installation, you can use environments that you can tailor for each project.

To create one: python -m venv myfirstproject

To enter it: source myfirstproject/bin/activate

For further questions, see https://www.w3schools.com/python/python_virtualenv.asp

Install Python dependencies

pip install -r requirements.txt

Note: If you are using a virtual environment, make sure to activate it before running the command.

Set the environment variables

In the .env-file, there are several vars that need to be configured.

ALLOWED_KEYS=ALLOWED_KEYS # These are the proxy keys to use for auth to the API, choose yourself
PRIMARY_API_KEY=PRIMARY_API_KEY # Your Hawki Web UI key — REQUIRED, the application will not start without it
PORT=8080 # adjust as you like, not necessary
HAWKI_API_URL=HAWKI_API_URL # the Hawki Web UI endpoint, defaults to https://hawki2.htwk-leipzig.de/api/ai-req

Run the Application

python wrapper.py

After that, you can access the application at http://localhost:8080.

Build and run with Docker

Build

From within the repository, execute docker build . -t YOUR_IMAGE_NAME. As you need to set the environment variables here, too. Either set it in the .env file before building, as described before, or set them as ENV ALLOWED_KEYS=…. in the Dockerfile, see comments there. A third option is to pass them as environemnt variables when running the container.

Running

docker run YOUR_IMAGE_NAME and optional ` -e ALLOWED_KEYS=.. ` when you want to pass the env vars here.

Exemplary request

The request format follows the OpenAI standard, i.e.:

{
  "model":"gpt-4o",
  "messages":
  [
    {"role":"system","content":"You are Auto Router, a large language model from openrouter.\n\nFormatting Rules:\n- Use Markdown **only when semantically appropriate**. Examples: `inline code`, ```code fences```, tables, and lists.\n- In assistant responses, format file names, directory paths, function names, and class names with backticks (`).\n- For math: use \\( and \\) for inline expressions, and \\[ and \\] for display (block) math."},
    {"role":"user","content":"Whats up?"}
  ]
}

Authentication

The Authorization header accepts two types of keys:

Proxy key – one of the keys configured in ALLOWED_KEYS in the .env file. This is the normal case for trusted clients that share a single upstream Hawki key managed by the wrapper operator.
Hawki Web UI key – your personal Hawki Web UI key. If you already have direct access to the Hawki instance, you can pass your own key and the wrapper will forward requests under that key without any additional setup in the env file.

Authorization: Bearer <your-proxy-key-or-hawki-web-ui-key>

Controlling the request timeout

By default the wrapper enforces a global_timeout of 60 seconds per request (configurable in service_config/files/.env). This timeout is used to work around the rate limit and retry requests until the configured timeout is exceeded. You can override this on a per-request basis by setting the X-Hawki-Request-Timeout header to the desired timeout in seconds.

X-Hawki-Request-Timeout: 120

Models

Initial model list

The wrapper ships with a pre-configured list of models, defined via the MODELS variable in config/models.env. These are the models that were/are offered by the HAWKI instance and are referred to as the initial models throughout the codebase.

At the time of writing, the default initial model list is:

Model	Provider
`gpt-4o`	OpenAI
`gpt-4o-mini`	OpenAI
`gpt-4.1`	OpenAI
`gpt-4.1-mini`	OpenAI
`gpt-5`	OpenAI
`o1-mini`	OpenAI
`o4-mini`	OpenAI
`gemini-1.5-flash`	Google
`gemini-2.0-flash-lite`	Google
`gemini-2.5-pro-exp-03-25`	Google
`meta-llama-3.1-8b-instruct`	Meta
`meta-llama-3.1-70b-instruct`	Meta
`deepseek-r1`	DeepSeek
`deepseek-r1-distill-llama-70b`	DeepSeek
`mistral-large-instruct`	Mistral
`codestral-22b`	Mistral
`qwen2.5-72b-instruct`	Alibaba
`qwen3-32b`	Alibaba
`gemma-3-27b-it`	Google DeepMind
`medgemma-3-27b-it`	Google DeepMind

Model availability

Not all initial models may be available at any given time. The set of models accessible through the underlying Hawki instance can change — models may be temporarily disabled, rate-limited, or removed by the Hawki operator (ITSZ) without notice.

During startup the wrapper probes all initial models and removes unavailable ones from its active list. This ensures that only working models are served to clients after startup.

ℹ️

The active model list is updated on every call to /health/details. Use that endpoint to get the current availability status of each model and to force a refresh of the active list.

To avoid unneccesary requests to the availability status, try your desired model and use the /health/details endpoint only when you face problems.

API Endpoints

`GET /health`

A lightweight health endpoint intended for liveness and readiness probes.

Returns a JSON object with:

status – always "healthy" when the service is running
timestamp – current server time in ISO 8601 format
completion_cache_size – number of entries currently stored in the LRU completion cache
initial_models – list of all configured models (regardless of their current availability)

Example response:

{
  "status": "healthy",
  "timestamp": "2026-03-05T12:00:00.000000",
  "completion_cache_size": 42,
  "initial_models": ["gpt-4o", "gpt-4o-mini"]
}

`GET /health/details`

A detailed diagnostic endpoint that actively probes each configured model by sending a real test request. This endpoint is more expensive to call and should not be used for frequent liveness probes.

Each model is probed twice – once without caching and once with caching – to verify both live availability and cache behaviour. The endpoint also updates the internal list of available models based on the probe results.

❗	A valid `Authorization: Bearer <api-key>` header is required. Without it (or if the key cannot be verified), no model diagnostics are run and `model_check` will not contain usage details.

Per-model usage statistics for the past 24 hours are included in the response when the key has recorded usage.

Returns a JSON object with:

status – always "healthy" when the service is running
timestamp – current server time in ISO 8601 format
completion_cache_size – number of entries currently stored in the LRU completion cache
model_check – a map of model names to their diagnostic results, each containing:
- requests – array of two probe results (uncached, then cached), each with:
  - started_at – timestamp of the probe request
  - runtime_in_ms – round-trip time in milliseconds
  - prompt – the health-check prompt that was sent
  - response – the model’s response body
  - status – "available" or "unavailable"
  - cached – whether the response was served from cache
- usage (optional, only when Authorization header is provided) – cumulative usage counts for the past 24 hours, keyed by relative hour offset ("-1" = last hour, "-2" = last 2 hours, …, "-24" = last 24 hours)

Example response:

{
  "status": "healthy",
  "timestamp": "2026-03-05T12:00:00.000000",
  "completion_cache_size": 42,
  "model_check": {
    "gpt-4o": {
      "requests": [
        {
          "started_at": "2026-03-05T11:59:58.000000",
          "runtime_in_ms": 1234.5,
          "prompt": "Health check test. Response with 'OK' if you are operational.",
          "response": "OK",
          "status": "available",
          "cached": false
        },
        {
          "started_at": "2026-03-05T11:59:59.000000",
          "runtime_in_ms": 3.2,
          "prompt": "Health check test. Response with 'OK' if you are operational.",
          "response": "OK",
          "status": "available",
          "cached": true
        }
      ],
      "usage": {
        "-1": 5,
        "-2": 12
      }
    }
  }
}

Trouble-shooting (Cooldowns)

If you face long waiting times for responses, that may be due to the GLOBAL_TIMEOUT setting in service_config/files/.env (default: 60 seconds). Increase it as needed — the same applies when responses may take longer due to large prompts. You can also override the timeout per-request using the X-Hawki-Request-Timeout header, as described above.

Contribute

We are happy to receive your contributions. Please create a pull request or an issue. As this tool is published under the MIT license, feel free to fork it and use it in your own projects.

Disclaimer

This tool just temporarily stores the image data. This tool is provided "as is" and without any warranty, express or implied.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
config		config
logs		logs
service_config		service_config
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
HawkiLLM.py		HawkiLLM.py
LICENSE		LICENSE
README.adoc		README.adoc
cache.py		cache.py
exceptions.py		exceptions.py
helpers.py		helpers.py
logger_config.py		logger_config.py
model_usage.py		model_usage.py
models.env		models.env
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test_health_endpoints.py		test_health_endpoints.py
test_model_usage.py		test_model_usage.py
test_wrapper.py		test_wrapper.py
wrapper.py		wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hawki API Wrapper (beta)

Building and Running the Application

Running locally with Python 3.10+

Install dependencies

Create a venv environment and run it

Install Python dependencies

Set the environment variables

Run the Application

Build and run with Docker

Build

Running

Exemplary request

Authentication

Controlling the request timeout

Models

Initial model list

Model availability

API Endpoints

`GET /health`

`GET /health/details`

Trouble-shooting (Cooldowns)

Contribute

Disclaimer

About

Uh oh!

Releases 16

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hawki API Wrapper (beta)

Building and Running the Application

Running locally with Python 3.10+

Install dependencies

Create a venv environment and run it

Install Python dependencies

Set the environment variables

Run the Application

Build and run with Docker

Build

Running

Exemplary request

Authentication

Controlling the request timeout

Models

Initial model list

Model availability

API Endpoints

GET /health

GET /health/details

Trouble-shooting (Cooldowns)

Contribute

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 16

Contributors

Uh oh!

Languages

`GET /health`

`GET /health/details`