This file is the high-level map of the ExcelBench codebase: what lives where, the allowed dependency direction, and the main runtime/data flows.
If you are starting a new session:
- Read this file (
architecture.md) to orient. - Read
CLAUDE.mdfor commands, workflows, and repo conventions. - Check active trackers under
docs/trackers/for current status and run history.
ExcelBench has two complementary tracks:
- Fidelity (correctness): "Does this library preserve Excel semantics for feature X?"
- Performance (speed/memory): "How fast is it at feature X or a scaled workload?"
A key design principle is reproducibility:
- Canonical fixtures are generated by real Excel and committed to git.
- Benchmarks produce JSON results as the source of truth, and render all other views from it.
At a high level, ExcelBench is split into six layers:
- Fixtures + generator (Excel as ground truth)
- Fidelity harness (adapters + scoring + diagnostics + semantic workbook diff)
- External oracle helpers (optional subprocess validators/generators)
- Performance harness (throughput workloads + best-effort memory)
- Rendering + publishing (markdown/csv + HTML dashboard + plots)
- Optional Rust acceleration (PyO3 extension + Rust-backed adapters)
One-way dependencies only:
models.pydefines the core contracts.- Adapters depend on models.
- Runners depend on adapters and models.
- Renderers depend on runner output schemas.
In practice:
models
^
adapters
^
harness runner (fidelity)
^
results renderer / visualizations
perf runner (performance) -> perf renderer
Rust extension (optional) is called by Rust-backed adapters; it should not depend on Python code.
WolfXL is an external PyPI dependency — it does not depend on ExcelBench.
Rule of thumb: keep adapters thin and deterministic. Any cross-library normalization should live in runner utilities, not in adapter implementations.
Most-touched top-level directories:
-
src/excelbench/cli.py: Typer CLI entrypoint (excelbench ...)models.py: dataclasses/contracts (CellValue, CellFormat, BorderInfo, ...)generator/: fixture generation (xlwings + Excel)harness/: fidelity benchmark runner, adapters, and optional external oracle helpersharness/workbook_snapshot.py: normalized workbook/package snapshots used by semantic diffsharness/semantic_diff.py: structured workbook diff artifacts for diagnostics/context lanesharness/roundtrip_runner.py: open/save idempotence context laneharness/compat_cases.py: openpyxl-style compatibility snippet laneharness/artifact_context.py: chart and macro package evidence lanesharness/external_fixture_specs/: tool-specific external oracle fixture definitions imported by the fixture-pack generatorperf/: performance runner + rendererresults/: fidelity result renderers (md/csv) + dashboards/plots
-
fixtures/excel/: canonical .xlsx fixtures (git-tracked, Excel-generated)excel_xls/: canonical .xls fixturesthroughput_xlsx/: scale fixtures for perf/throughput workloads
-
tools/external-oracles/- Optional subprocess helpers for non-Python oracle tools.
excelize/: Go helper that generates and inspects.xlsxfixtures with Excelize. Run from that directory withgo run ..libreoffice/: Python helper that runs LibreOffice headless open/save and PDF render validation.exceljs/: Node helper that generates workbooks with ExcelJS tables, formulas, data validations, rich strings, comments, hyperlinks, images, merges, freeze panes, and sheet protection.apache-poi/: Java helper that generates workbooks with Apache POI tables, formulas, data validations, rich strings, comments, hyperlinks, images, merges, freeze panes, and sheet protection.closedxml/: .NET helper that generates workbooks with ClosedXML tables, pivots, and conditional formatting.npoi/: .NET helper that generates POI-style workbooks with NPOI formulas, comments, rich text, merged ranges, and sheet protection.
-
rust/excelbench_rust/(optional, local-only)- PyO3 crate for ExcelBench-specific Rust backends (umya-spreadsheet, basic calamine)
- The core WolfXL backends (calamine-styled, rust_xlsxwriter, xlsx patcher) are now in the
standalone
wolfxlpackage on PyPI (pip install wolfxl)
-
WolfXL (external dependency,
pip install wolfxl)- Standalone repo: https://github.com/SynthGL/wolfxl
- Openpyxl-compatible API:
load_workbook,Workbook,Font,PatternFill, etc. - Three modes: read (calamine-styles), write (rust_xlsxwriter), modify (XlsxPatcher)
- Installed as optional dependency:
uv sync --extra rust
-
tests/: pytest suites (fidelity + adapter unit tests + visualization smoke tests) -
docs/: plans and trackers (treat as source of truth for methodology and run logs)
Important "scratch" conventions:
test_files/is local scratch (gitignored).results_dev_*directories are local/ephemeral benchmark outputs (often gitignored).
Common starting points by intent:
-
Add a new adapter:
src/excelbench/harness/adapters/base.pysrc/excelbench/harness/adapters/__init__.py
-
Add a new external oracle:
src/excelbench/harness/external_oracles.pydocs/trackers/external-oracle-expansion.md- Keep helpers optional and subprocess-isolated until promoted into normal benchmark flows.
-
Add a new scored feature:
- Generator:
src/excelbench/generator/features/ - Harness exercise/scoring:
src/excelbench/harness/runner.py
- Generator:
-
Extend Tier 2/3 OOXML parsing:
- WolfXL (external): calamine-styled + ooxml utilities live in wolfxl repo
- ExcelBench-local Rust:
rust/excelbench_rust/src/(basic calamine, umya bindings)
-
Performance track:
- Runner:
src/excelbench/perf/runner.py - Renderer:
src/excelbench/perf/renderer.py - Throughput driver:
scripts/run_throughput_dashboard.py - Memory profiling:
scripts/memory_profile.py
- Runner:
-
Dashboards/plots:
- HTML dashboard:
src/excelbench/results/html_dashboard.py - Scatter plots:
src/excelbench/results/scatter.py
- HTML dashboard:
xlwings -> Excel
-> writes feature workbooks
-> writes manifest.json
-> fixtures committed to git
Command: uv run excelbench generate --output fixtures/excel
fixtures + manifest
-> runner loads adapters
-> adapters read/write
-> oracle verification (Excel via xlwings; fallback openpyxl)
-> results.json + diagnostics
-> renderers produce README.md/matrix.csv/plots
Command: uv run excelbench benchmark --tests fixtures/excel --output results
JSON fixture manifest
-> external_oracles.py launches helper subprocess
-> Excelize / LibreOffice / Apache POI / ClosedXML helper creates or validates workbook
-> helper prints JSON diagnostics
-> local-only results are reviewed before promoting stable cases to fixtures
-> optional WolfXL validation script checks read + in-place modify-save preservation
External oracle helpers are intentionally not part of get_all_adapters() yet.
Missing Go/Java/.NET/LibreOffice commands should skip cleanly, not fail the core
suite. The first local pack is generated by
scripts/generate_external_oracle_fixtures.py into results_dev_external/ and
validated against WolfXL with scripts/validate_external_oracle_fixtures_with_wolfxl.py.
workbook(s)
-> workbook_snapshot.py normalizes workbook semantics and selected OOXML parts
-> semantic_diff.py compares category-level snapshots
-> diff-workbooks / roundtrip-context / compatibility-context / chart and macro context commands
-> JSON + markdown artifacts with explicit skips for unsupported adapters
Commands:
uv run excelbench diff-workbooks --left a.xlsx --right b.xlsx --output results-workbook-diffuv run excelbench roundtrip-context --tests fixtures/excel --output results-roundtripuv run excelbench compatibility-context --output results-compatibilityuv run excelbench cross-language-chart-context --output results-cross-language-chartsuv run excelbench macro-context --tests fixtures/excel_xlsm --output results-macros
These are context/evidence lanes. They do not change the core fidelity scoring model unless a future decision explicitly promotes their outputs into scored benchmark features.
fixtures + throughput fixtures
-> perf runner executes workloads (no oracle)
-> wall/cpu/rss (and optional phase breakdown)
-> perf/results.json
-> perf renderer produces markdown/csv
Command: uv run excelbench perf --tests fixtures/excel --output results
- Local:
uv run excelbench html,uv run excelbench scatter,uv run excelbench heatmap - CI:
.github/workflows/deploy-dashboard.ymlauto-builds and deploys the HTML dashboard to Vercel
Update architecture.md when:
- A new top-level module/directory is introduced.
- Dependency direction changes (new allowed imports / new shared utilities).
- A new CLI command or major runner mode is added.
- A new dashboard/output becomes a supported interface.