Skip to content

feat(embeddings): Add embeddings calibration script (hardware-aware auto-tuning) #28

@marco0560

Description

@marco0560

Add embeddings calibration script (hardware-aware auto-tuning)

Status


Summary

Introduce a calibration script that runs on a target machine (CPU/GPU) and determines optimal embeddings parameters, generating configuration values compatible with the configuration system introduced in #17.


Motivation

Embeddings performance depends heavily on:

  • CPU core count
  • memory bandwidth
  • GPU availability and VRAM
  • model characteristics

Static defaults are suboptimal across machines.

A calibration step allows:

  • reproducible performance tuning
  • optimal hardware utilization
  • reduced manual configuration effort

Goal

Provide a deterministic, reproducible calibration tool that:

  • benchmarks embedding execution on the current machine
  • selects optimal parameters
  • outputs configuration compatible with Codira config files

Scope

Calibration targets

  • device selection (cpu / gpu / auto)
  • thread count
  • batch size
  • GPU memory usage limits

Proposed Interface

CLI

codira calibrate embeddings

Output modes

codira calibrate embeddings --print
codira calibrate embeddings --write
codira calibrate embeddings --output <path>
  • --print → stdout (TOML snippet)
  • --write → writes to user config
  • --output → writes to specified file

Output Format

Example:

[embeddings]
enabled = true
device = "gpu"
threads = 8
batch_size = 64

[embeddings.gpu]
device_id = 0
memory_limit_mb = 6144

Calibration Method

Deterministic benchmarking

  • fixed input dataset (bundled or generated deterministically)
  • fixed number of iterations
  • warm-up phase
  • measure:
    • throughput (texts/sec)
    • latency
    • memory usage

Parameter search space

  • threads: {1, 2, 4, 8, auto}
  • batch_size: {8, 16, 32, 64, 128}
  • device:
    • cpu
    • gpu (if available)

Selection criteria

  • maximize throughput
  • respect memory limits
  • avoid instability (OOM, timeouts)

Design Constraints

  • deterministic results for identical hardware
  • no network dependency
  • no external services
  • reproducible across runs
  • bounded execution time

Hardware Detection

  • CPU:
    • core count
  • GPU:
    • availability
    • device id
    • VRAM (if accessible)

Safety Mechanisms

  • detect OOM and discard configuration
  • fallback to safe defaults
  • limit total calibration duration

Integration with #17

  • output must match config schema
  • compatible with:
    • user config
    • repo config
  • no direct mutation unless --write is used

Non-goals

  • dynamic runtime adaptation
  • continuous auto-tuning
  • model selection optimization
  • distributed benchmarking

Acceptance Criteria

  • calibration command runs successfully on CPU-only systems
  • calibration command runs on GPU-enabled systems
  • outputs valid TOML config
  • results improve performance vs defaults
  • results are reproducible on same hardware
  • no crashes under constrained environments

Implementation Notes

  • isolate calibration logic in dedicated module
  • reuse embedding pipeline
  • avoid impacting normal runtime paths
  • ensure compatibility with future embedding providers

Dependencies


Notes

This feature enables:

  • portable performance tuning
  • simplified onboarding on new machines
  • better utilization of heterogeneous environments

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions