Skip to content

Performance snapshot system for comparing benchmarks across commits #59

@mpryor

Description

@mpryor

Summary

We need a system for capturing performance benchmark timings at specific commits and comparing them against the stored baseline or other snapshots.

Problem

Currently test_perf.py only asserts pass/fail against hardcoded baselines with a 2x multiplier. There's no way to:

  • See actual timings vs baseline in a table
  • Save a snapshot at a commit for later comparison
  • Compare two arbitrary snapshots (e.g. before/after a change)

Features

  • poetry run python bench_snapshot.py — run benchmarks, print comparison table against baselines
  • poetry run python bench_snapshot.py --save — save timings as a named snapshot (keyed by git short hash)
  • poetry run python bench_snapshot.py --compare <commit> — compare a saved snapshot to baselines
  • poetry run python bench_snapshot.py --list — list saved snapshots

Technical Considerations

  • pytest --durations includes test setup/teardown overhead (~0.3-0.5s per test for Textual app init), so timings won't match the pure operation baselines exactly
  • Ideally instrument the tests to print actual operation timings (the elapsed variable in each test) rather than relying on --durations
  • Consider a pytest plugin or conftest fixture that captures and reports the measured elapsed values
  • Snapshots should be gitignored (machine-specific)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions