Skip to content

WSE-research/Hawki-API-Wrapper

Repository files navigation

Hawki API Wrapper (beta)

Building and Running the Application

Running locally with Python 3.10+

Install dependencies

Create a venv environment and run it

To seperate the libraries and their versions from your global pip/python installation, you can use environments that you can tailor for each project.

To create one: python -m venv myfirstproject

To enter it: source myfirstproject/bin/activate

Install Python dependencies
pip install -r requirements.txt

Note: If you are using a virtual environment, make sure to activate it before running the command.

Set the environment variables

In the .env-file, there are several vars that need to be configured.

ALLOWED_KEYS=ALLOWED_KEYS # These are the proxy keys to use for auth to the API, choose yourself
PRIMARY_API_KEY=PRIMARY_API_KEY # Your Hawki Web UI key — REQUIRED, the application will not start without it
PORT=8080 # adjust as you like, not necessary
HAWKI_API_URL=HAWKI_API_URL # the Hawki Web UI endpoint, defaults to https://hawki2.htwk-leipzig.de/api/ai-req

Run the Application

python wrapper.py

After that, you can access the application at http://localhost:8080.

Build and run with Docker

Build

From within the repository, execute docker build . -t YOUR_IMAGE_NAME. As you need to set the environment variables here, too. Either set it in the .env file before building, as described before, or set them as ENV ALLOWED_KEYS=…​. in the Dockerfile, see comments there. A third option is to pass them as environemnt variables when running the container.

Running

docker run YOUR_IMAGE_NAME and optional ` -e ALLOWED_KEYS=.. ` when you want to pass the env vars here.

Exemplary request

The request format follows the OpenAI standard, i.e.:

{
  "model":"gpt-4o",
  "messages":
  [
    {"role":"system","content":"You are Auto Router, a large language model from openrouter.\n\nFormatting Rules:\n- Use Markdown **only when semantically appropriate**. Examples: `inline code`, ```code fences```, tables, and lists.\n- In assistant responses, format file names, directory paths, function names, and class names with backticks (`).\n- For math: use \\( and \\) for inline expressions, and \\[ and \\] for display (block) math."},
    {"role":"user","content":"Whats up?"}
  ]
}

Authentication

The Authorization header accepts two types of keys:

  • Proxy key – one of the keys configured in ALLOWED_KEYS in the .env file. This is the normal case for trusted clients that share a single upstream Hawki key managed by the wrapper operator.

  • Hawki Web UI key – your personal Hawki Web UI key. If you already have direct access to the Hawki instance, you can pass your own key and the wrapper will forward requests under that key without any additional setup in the env file.

Authorization: Bearer <your-proxy-key-or-hawki-web-ui-key>

Controlling the request timeout

By default the wrapper enforces a global_timeout of 60 seconds per request (configurable in service_config/files/.env). This timeout is used to work around the rate limit and retry requests until the configured timeout is exceeded. You can override this on a per-request basis by setting the X-Hawki-Request-Timeout header to the desired timeout in seconds.

X-Hawki-Request-Timeout: 120

Models

Initial model list

The wrapper ships with a pre-configured list of models, defined via the MODELS variable in config/models.env. These are the models that were/are offered by the HAWKI instance and are referred to as the initial models throughout the codebase.

At the time of writing, the default initial model list is:

Model Provider

gpt-4o

OpenAI

gpt-4o-mini

OpenAI

gpt-4.1

OpenAI

gpt-4.1-mini

OpenAI

gpt-5

OpenAI

o1-mini

OpenAI

o4-mini

OpenAI

gemini-1.5-flash

Google

gemini-2.0-flash-lite

Google

gemini-2.5-pro-exp-03-25

Google

meta-llama-3.1-8b-instruct

Meta

meta-llama-3.1-70b-instruct

Meta

deepseek-r1

DeepSeek

deepseek-r1-distill-llama-70b

DeepSeek

mistral-large-instruct

Mistral

codestral-22b

Mistral

qwen2.5-72b-instruct

Alibaba

qwen3-32b

Alibaba

gemma-3-27b-it

Google DeepMind

medgemma-3-27b-it

Google DeepMind

Model availability

Not all initial models may be available at any given time. The set of models accessible through the underlying Hawki instance can change — models may be temporarily disabled, rate-limited, or removed by the Hawki operator (ITSZ) without notice.

During startup the wrapper probes all initial models and removes unavailable ones from its active list. This ensures that only working models are served to clients after startup.

ℹ️

The active model list is updated on every call to /health/details. Use that endpoint to get the current availability status of each model and to force a refresh of the active list.

To avoid unneccesary requests to the availability status, try your desired model and use the /health/details endpoint only when you face problems.

API Endpoints

GET /health

A lightweight health endpoint intended for liveness and readiness probes.

Returns a JSON object with:

  • status – always "healthy" when the service is running

  • timestamp – current server time in ISO 8601 format

  • completion_cache_size – number of entries currently stored in the LRU completion cache

  • initial_models – list of all configured models (regardless of their current availability)

Example response:

{
  "status": "healthy",
  "timestamp": "2026-03-05T12:00:00.000000",
  "completion_cache_size": 42,
  "initial_models": ["gpt-4o", "gpt-4o-mini"]
}

GET /health/details

A detailed diagnostic endpoint that actively probes each configured model by sending a real test request. This endpoint is more expensive to call and should not be used for frequent liveness probes.

Each model is probed twice – once without caching and once with caching – to verify both live availability and cache behaviour. The endpoint also updates the internal list of available models based on the probe results.

A valid Authorization: Bearer <api-key> header is required. Without it (or if the key cannot be verified), no model diagnostics are run and model_check will not contain usage details.

Per-model usage statistics for the past 24 hours are included in the response when the key has recorded usage.

Returns a JSON object with:

  • status – always "healthy" when the service is running

  • timestamp – current server time in ISO 8601 format

  • completion_cache_size – number of entries currently stored in the LRU completion cache

  • model_check – a map of model names to their diagnostic results, each containing:

    • requests – array of two probe results (uncached, then cached), each with:

      • started_at – timestamp of the probe request

      • runtime_in_ms – round-trip time in milliseconds

      • prompt – the health-check prompt that was sent

      • response – the model’s response body

      • status"available" or "unavailable"

      • cached – whether the response was served from cache

    • usage (optional, only when Authorization header is provided) – cumulative usage counts for the past 24 hours, keyed by relative hour offset ("-1" = last hour, "-2" = last 2 hours, …, "-24" = last 24 hours)

Example response:

{
  "status": "healthy",
  "timestamp": "2026-03-05T12:00:00.000000",
  "completion_cache_size": 42,
  "model_check": {
    "gpt-4o": {
      "requests": [
        {
          "started_at": "2026-03-05T11:59:58.000000",
          "runtime_in_ms": 1234.5,
          "prompt": "Health check test. Response with 'OK' if you are operational.",
          "response": "OK",
          "status": "available",
          "cached": false
        },
        {
          "started_at": "2026-03-05T11:59:59.000000",
          "runtime_in_ms": 3.2,
          "prompt": "Health check test. Response with 'OK' if you are operational.",
          "response": "OK",
          "status": "available",
          "cached": true
        }
      ],
      "usage": {
        "-1": 5,
        "-2": 12
      }
    }
  }
}

Trouble-shooting (Cooldowns)

If you face long waiting times for responses, that may be due to the GLOBAL_TIMEOUT setting in service_config/files/.env (default: 60 seconds). Increase it as needed — the same applies when responses may take longer due to large prompts. You can also override the timeout per-request using the X-Hawki-Request-Timeout header, as described above.

Contribute

We are happy to receive your contributions. Please create a pull request or an issue. As this tool is published under the MIT license, feel free to fork it and use it in your own projects.

Disclaimer

This tool just temporarily stores the image data. This tool is provided "as is" and without any warranty, express or implied.