Single source of truth for the 7-Dimension Extension initiative. Each row tracks one self-contained sprint (one branch, one PR, one row flip). Resume cold by reading this file and the most recent
[*INCOMPLETE*]marker.
Last updated: 2026-04-27 (S3 shipped)
| # | Dimension | Status | Sprint size | Branch | PR | Acceptance commit range |
|---|---|---|---|---|---|---|
| S1 | Memory honesty + Tracker bootstrap | Shipped | S (3–5 d) | feat/perf-mem-honesty |
#28 | 50dc104..HEAD@PR#28 |
| S2 | Data shape (int/str/date/formula) | Shipped | M (1 wk) | feat/perf-data-shape |
#31 | 373c896..cbb530f |
| S3 | File shape (wide/tall/sparse) | Shipped | M (1 wk) | feat/perf-file-shape |
#32 | eb89aba..c7cf25d |
| S4 | High-cost operations | Planned | M (1 wk) | feat/perf-operations |
— | — |
| S5 | Workbook complexity perf | Planned | M (1 wk) | feat/perf-complexity |
— | — |
| S6 | Cold-start / warm path | Planned | S (3–5 d) | feat/perf-cold-start |
— | — |
| S7 | Round-trip fidelity (LibreOffice) | Planned | L (~2 wk) | feat/fidelity-roundtrip |
— | — |
Status legend: Planned → In Progress → Shipped (or Blocked with reason).
When a sprint lands:
- Update the row's Status to
Shipped. - Fill in the PR column (
#NN). - Fill in Acceptance commit range (
abc1234..def5678). - Bump the Last updated line at the top of this file.
- Append a sprint acceptance entry (template below) to the Acceptance Notes section.
- Add the corresponding
DEC-NNNentry todecisions.mdif not already done.
If a sprint stalls, switch its status to Blocked and add a one-line reason in the row.
Use this template when appending to Acceptance Notes below.
### S<N> — <Dimension> (YYYY-MM-DD)
**Branch**: `feat/...` · **PR**: #NN · **Commit range**: `abc1234..def5678`
**What shipped**:
- <one-line bullet per major piece>
**Verification**:
- `uv run pytest tests/` ✓
- `uv run ruff check src/ tests/` ✓
- `uv run mypy src/` ✓
- `excelbench <new-subcommand> ...` ✓ (16 adapters, no crashes)
- Dashboard regenerated, results.json + history.jsonl appended.
**Decisions**: DEC-NNN logged in `decisions.md`.
**Deferred / out-of-scope**:
- <items intentionally left for follow-up>Branch: feat/perf-file-shape · PR: #32 · Commit range: eb89aba..c7cf25d
What shipped:
- 12 file-shape benchmark scenarios across wide, tall, sparse, and many-sheets categories.
excelbench perf-file-shapeCLI with category filtering, tier caps, on-demand fixture regeneration, and Sprint 1 memory-mode support.n_sheets/sheet_patternworkload fan-out so many-sheets runs exercise per-sheet overhead without duplicating dtype logic._section_file_shapedashboard heatmaps for read/write throughput by shape category.- Cross-command staleness guards for
data_shape_*andfile_shape_*manifests, fixing the Codex P1 review finding. - 54 focused data-shape/file-shape tests covering CLI helpers, staleness detection, runner fan-out, and dashboard rendering.
Verification:
uv run pytest tests/test_perf_file_shape.py tests/test_perf_data_shape.py -v --no-cov✓ (54 passed)uv run ruff check src/ tests/ scripts/✓uv run mypy src/✓- PR #32 CI ✓: lint, test 3.11, test 3.12, benchmark, rust_smoke.
Decisions: DEC-020 logged in decisions.md.
Deferred / out-of-scope:
- Full 16+ adapter run at the 1M tier remains a bench-machine task.
- Cross-product of file shape × dtype remains deferred until dashboard data shows that interaction is worth the matrix cost.
Branch: feat/perf-data-shape · PR: #31 · Commit range: 373c896..cbb530f
What shipped:
- 10 dtypes × 4 cell-count tiers (1k/10k/100k/1m) data-shape benchmark matrix.
Dtypes:
int,float,string_short,string_long,boolean,date,datetime,formula_simple,formula_cross_sheet,mixed_realistic. excelbench perf-shapeCLI subcommand with on-demand fixture regen, staleness detection,--types/--rowsfiltering, inherits Sprint 1's--memory-modeplumbing.scripts/generate_throughput_fixtures.pyextended withgenerate_data_shape_scenarios(),--shape-only,--include-1mflags. Generator/runner content stays in lockstep via shared 1-based offset convention (fixed in PR review)._run_workload_writeextended with 6 newvalue_typebranches (date,datetime,boolean,formula_simple,formula_cross_sheet,mixed_realistic) plus float coverage via existingnumberpath._section_data_shapedashboard helper — per-dtype log-normalized heatmaps for read and write at the largest available tier; tooltip shows the full per-tier curve.mixed_realisticratio (60/30/5/3/2 string/int/date/formula/blank) documented infixtures/synthetic_calibration/sample_set.md.- 31 new tests in
tests/test_perf_data_shape.pycovering allvalue_typebranches, CLI helpers, dashboard rendering, and end-to-end perf_shape invocation.
Verification:
uv run pytest tests/✓ (1171 passed, 32 skipped, 6 xfailed)uv run ruff check src/ tests/ scripts/✓uv run mypy src/✓- Local coverage 67.64%, Linux CI ~65.9% (gate 65%).
- All 5 CI jobs green: lint, test 3.11/3.12, benchmark, rust_smoke.
- 9 Copilot inline review comments addressed (tier cap mismatch, generator/runner offset for boolean/date/datetime, README wording, scenario-vs-row count, isinstance form) with reply threads.
Decisions: DEC-019 logged in decisions.md.
Deferred / out-of-scope:
- High-cost operations (
append_rows,iter_rows_values,modify_one_cell,cell.fontaccess) — Sprint 4. - File shape (wide/tall/sparse/many-sheets) — Sprint 3.
- Cold-import cost per dtype — Sprint 6 (subprocess isolation needed).
- Calibration of
mixed_realisticagainst a corpus larger than the 50-file sample — flagged in DEC-019, revisit if dashboard data suggests the ratio is wrong.
Branch: feat/perf-mem-honesty · PR: #28 · Commit range: 50dc104..HEAD (final range fills in on merge)
What shipped:
TRACKER.md(this file) — 7-row sprint table, row-flip protocol, acceptance template.src/excelbench/perf/memory.py— three-mode memory harness (getrusage/tracemalloc/timevia/usr/bin/time -lsubprocess +allcomposite).MemoryProbecontext manager for in-process modes;parse_time_l_stderrcross-platform parser (macOS BSD time + GNU time-l).PerfOpResultextended withrss_kb_via_timeandpython_heap_peak_kbfields (existingrss_peak_mbpreserved — backwards-compatible).src/excelbench/perf/_iter_subprocess.py— internal subprocess entrypoint that runs one iteration per invocation; wrapped by parent under/usr/bin/time -l.excelbench perf --memory-mode={getrusage,tracemalloc,time,all}CLI flag.- HTML dashboard renders dual
RSS (MB) — getrusage / time -lcells with a tooltip explaining divergence whenever any entry has atime -lmeasurement. - DEC-018 documents why three modes coexist and what each is honest about.
Verification (run on macOS 25.2, Python 3.13):
uv run pytest tests/✓ 1140 passed, 32 skipped, 6 xfaileduv run ruff check src/excelbench/perf/ src/excelbench/cli.py src/excelbench/results/html_dashboard.py✓uv run mypy src/excelbench/perf/✓ no issuesexcelbench perf --memory-mode=all --feature cell_values --adapter wolfxl --adapter openpyxl --warmup 1 --iters 2:- All three fields populated as expected.
- Python-heap honesty signal landed: openpyxl uses 16× (read) and 227× (write) more Python heap than wolfxl on the same workload, confirming wolfxl pushes allocations into Rust.
time -l/getrusageratio ~0.97x on small fixtures (subprocess startup dominates); expected to diverge meaningfully once Sprint 2 lands ≥1M-cell fixtures.
Decisions: DEC-018 logged in decisions.md.
Deferred / out-of-scope:
- Tracemalloc reset semantics across nested probes — current code uses
reset_peak()when a probe re-enters an already-traced context. Should be revisited if any caller starts tracemalloc outside the probe. time -lsubprocess support on Windows — skipped silently (no/usr/bin/time). Sprint 6 (cold-start) will set the precedent for cross-platform subprocess handling.- Visualizing the time-l/getrusage divergence as a dedicated chart — single dual-cell with tooltip is sufficient until S2 ships larger fixtures that make the gap visible.
- Plan: see the wolfxl session that produced this tracker (multi-sprint roadmap).
- Architecture:
architecture.md - Decisions:
decisions.md - Key seams:
src/excelbench/perf/runner.py,src/excelbench/harness/adapters/base.py,src/excelbench/results/html_dashboard.py.