ExcelBench — Sprint Tracker

Single source of truth for the 7-Dimension Extension initiative. Each row tracks one self-contained sprint (one branch, one PR, one row flip). Resume cold by reading this file and the most recent [*INCOMPLETE*] marker.

Last updated: 2026-04-27 (S3 shipped)

Status Table

#	Dimension	Status	Sprint size	Branch	PR	Acceptance commit range
S1	Memory honesty + Tracker bootstrap	Shipped	S (3–5 d)	`feat/perf-mem-honesty`	#28	`50dc104..HEAD@PR#28`
S2	Data shape (int/str/date/formula)	Shipped	M (1 wk)	`feat/perf-data-shape`	#31	`373c896..cbb530f`
S3	File shape (wide/tall/sparse)	Shipped	M (1 wk)	`feat/perf-file-shape`	#32	`eb89aba..c7cf25d`
S4	High-cost operations	Planned	M (1 wk)	`feat/perf-operations`	—	—
S5	Workbook complexity perf	Planned	M (1 wk)	`feat/perf-complexity`	—	—
S6	Cold-start / warm path	Planned	S (3–5 d)	`feat/perf-cold-start`	—	—
S7	Round-trip fidelity (LibreOffice)	Planned	L (~2 wk)	`feat/fidelity-roundtrip`	—	—

Status legend: Planned → In Progress → Shipped (or Blocked with reason).

How to Flip a Row

When a sprint lands:

Update the row's Status to Shipped.
Fill in the PR column (#NN).
Fill in Acceptance commit range (abc1234..def5678).
Bump the Last updated line at the top of this file.
Append a sprint acceptance entry (template below) to the Acceptance Notes section.
Add the corresponding DEC-NNN entry to decisions.md if not already done.

If a sprint stalls, switch its status to Blocked and add a one-line reason in the row.

Sprint Acceptance Template

Use this template when appending to Acceptance Notes below.

### S<N> — <Dimension> (YYYY-MM-DD)

**Branch**: `feat/...`  ·  **PR**: #NN  ·  **Commit range**: `abc1234..def5678`

**What shipped**:
- <one-line bullet per major piece>

**Verification**:
- `uv run pytest tests/` ✓
- `uv run ruff check src/ tests/` ✓
- `uv run mypy src/` ✓
- `excelbench <new-subcommand> ...` ✓ (16 adapters, no crashes)
- Dashboard regenerated, results.json + history.jsonl appended.

**Decisions**: DEC-NNN logged in `decisions.md`.

**Deferred / out-of-scope**:
- <items intentionally left for follow-up>

Acceptance Notes

S3 — File shape (wide/tall/sparse) (2026-04-27)

Branch: feat/perf-file-shape · PR: #32 · Commit range: eb89aba..c7cf25d

What shipped:

12 file-shape benchmark scenarios across wide, tall, sparse, and many-sheets categories.
excelbench perf-file-shape CLI with category filtering, tier caps, on-demand fixture regeneration, and Sprint 1 memory-mode support.
n_sheets / sheet_pattern workload fan-out so many-sheets runs exercise per-sheet overhead without duplicating dtype logic.
_section_file_shape dashboard heatmaps for read/write throughput by shape category.
Cross-command staleness guards for data_shape_* and file_shape_* manifests, fixing the Codex P1 review finding.
54 focused data-shape/file-shape tests covering CLI helpers, staleness detection, runner fan-out, and dashboard rendering.

Verification:

uv run pytest tests/test_perf_file_shape.py tests/test_perf_data_shape.py -v --no-cov ✓ (54 passed)
uv run ruff check src/ tests/ scripts/ ✓
uv run mypy src/ ✓
PR #32 CI ✓: lint, test 3.11, test 3.12, benchmark, rust_smoke.

Decisions: DEC-020 logged in decisions.md.

Deferred / out-of-scope:

Full 16+ adapter run at the 1M tier remains a bench-machine task.
Cross-product of file shape × dtype remains deferred until dashboard data shows that interaction is worth the matrix cost.

S2 — Data shape (int/str/date/formula) (2026-04-27)

Branch: feat/perf-data-shape · PR: #31 · Commit range: 373c896..cbb530f

What shipped:

10 dtypes × 4 cell-count tiers (1k/10k/100k/1m) data-shape benchmark matrix. Dtypes: int, float, string_short, string_long, boolean, date, datetime, formula_simple, formula_cross_sheet, mixed_realistic.
excelbench perf-shape CLI subcommand with on-demand fixture regen, staleness detection, --types/--rows filtering, inherits Sprint 1's --memory-mode plumbing.
scripts/generate_throughput_fixtures.py extended with generate_data_shape_scenarios(), --shape-only, --include-1m flags. Generator/runner content stays in lockstep via shared 1-based offset convention (fixed in PR review).
_run_workload_write extended with 6 new value_type branches (date, datetime, boolean, formula_simple, formula_cross_sheet, mixed_realistic) plus float coverage via existing number path.
_section_data_shape dashboard helper — per-dtype log-normalized heatmaps for read and write at the largest available tier; tooltip shows the full per-tier curve.
mixed_realistic ratio (60/30/5/3/2 string/int/date/formula/blank) documented in fixtures/synthetic_calibration/sample_set.md.
31 new tests in tests/test_perf_data_shape.py covering all value_type branches, CLI helpers, dashboard rendering, and end-to-end perf_shape invocation.

Verification:

uv run pytest tests/ ✓ (1171 passed, 32 skipped, 6 xfailed)
uv run ruff check src/ tests/ scripts/ ✓
uv run mypy src/ ✓
Local coverage 67.64%, Linux CI ~65.9% (gate 65%).
All 5 CI jobs green: lint, test 3.11/3.12, benchmark, rust_smoke.
9 Copilot inline review comments addressed (tier cap mismatch, generator/runner offset for boolean/date/datetime, README wording, scenario-vs-row count, isinstance form) with reply threads.

Decisions: DEC-019 logged in decisions.md.

Deferred / out-of-scope:

High-cost operations (append_rows, iter_rows_values, modify_one_cell, cell.font access) — Sprint 4.
File shape (wide/tall/sparse/many-sheets) — Sprint 3.
Cold-import cost per dtype — Sprint 6 (subprocess isolation needed).
Calibration of mixed_realistic against a corpus larger than the 50-file sample — flagged in DEC-019, revisit if dashboard data suggests the ratio is wrong.

S1 — Memory honesty + Tracker bootstrap (2026-04-27)

Branch: feat/perf-mem-honesty · PR: #28 · Commit range: 50dc104..HEAD (final range fills in on merge)

What shipped:

TRACKER.md (this file) — 7-row sprint table, row-flip protocol, acceptance template.
src/excelbench/perf/memory.py — three-mode memory harness (getrusage / tracemalloc / time via /usr/bin/time -l subprocess + all composite). MemoryProbe context manager for in-process modes; parse_time_l_stderr cross-platform parser (macOS BSD time + GNU time -l).
PerfOpResult extended with rss_kb_via_time and python_heap_peak_kb fields (existing rss_peak_mb preserved — backwards-compatible).
src/excelbench/perf/_iter_subprocess.py — internal subprocess entrypoint that runs one iteration per invocation; wrapped by parent under /usr/bin/time -l.
excelbench perf --memory-mode={getrusage,tracemalloc,time,all} CLI flag.
HTML dashboard renders dual RSS (MB) — getrusage / time -l cells with a tooltip explaining divergence whenever any entry has a time -l measurement.
DEC-018 documents why three modes coexist and what each is honest about.

Verification (run on macOS 25.2, Python 3.13):

uv run pytest tests/ ✓ 1140 passed, 32 skipped, 6 xfailed
uv run ruff check src/excelbench/perf/ src/excelbench/cli.py src/excelbench/results/html_dashboard.py ✓
uv run mypy src/excelbench/perf/ ✓ no issues
excelbench perf --memory-mode=all --feature cell_values --adapter wolfxl --adapter openpyxl --warmup 1 --iters 2:
- All three fields populated as expected.
- Python-heap honesty signal landed: openpyxl uses 16× (read) and 227× (write) more Python heap than wolfxl on the same workload, confirming wolfxl pushes allocations into Rust.
- time -l/getrusage ratio ~0.97x on small fixtures (subprocess startup dominates); expected to diverge meaningfully once Sprint 2 lands ≥1M-cell fixtures.

Decisions: DEC-018 logged in decisions.md.

Deferred / out-of-scope:

Tracemalloc reset semantics across nested probes — current code uses reset_peak() when a probe re-enters an already-traced context. Should be revisited if any caller starts tracemalloc outside the probe.
time -l subprocess support on Windows — skipped silently (no /usr/bin/time). Sprint 6 (cold-start) will set the precedent for cross-platform subprocess handling.
Visualizing the time-l/getrusage divergence as a dedicated chart — single dual-cell with tooltip is sufficient until S2 ships larger fixtures that make the gap visible.

Reference

Plan: see the wolfxl session that produced this tracker (multi-sprint roadmap).
Architecture: architecture.md
Decisions: decisions.md
Key seams: src/excelbench/perf/runner.py, src/excelbench/harness/adapters/base.py, src/excelbench/results/html_dashboard.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExcelBench — Sprint Tracker

Status Table

How to Flip a Row

Sprint Acceptance Template

Acceptance Notes

S3 — File shape (wide/tall/sparse) (2026-04-27)

S2 — Data shape (int/str/date/formula) (2026-04-27)

S1 — Memory honesty + Tracker bootstrap (2026-04-27)

Reference

FilesExpand file tree

TRACKER.md

Latest commit

History

TRACKER.md

File metadata and controls

ExcelBench — Sprint Tracker

Status Table

How to Flip a Row

Sprint Acceptance Template

Acceptance Notes

S3 — File shape (wide/tall/sparse) (2026-04-27)

S2 — Data shape (int/str/date/formula) (2026-04-27)

S1 — Memory honesty + Tracker bootstrap (2026-04-27)

Reference