Architecture Map (Read This Before Editing)

This file is the high-level map of the ExcelBench codebase: what lives where, the allowed dependency direction, and the main runtime/data flows.

If you are starting a new session:

Read this file (architecture.md) to orient.
Read CLAUDE.md for commands, workflows, and repo conventions.
Check active trackers under docs/trackers/ for current status and run history.

Big Picture

ExcelBench has two complementary tracks:

Fidelity (correctness): "Does this library preserve Excel semantics for feature X?"
Performance (speed/memory): "How fast is it at feature X or a scaled workload?"

A key design principle is reproducibility:

Canonical fixtures are generated by real Excel and committed to git.
Benchmarks produce JSON results as the source of truth, and render all other views from it.

Core Layers

At a high level, ExcelBench is split into six layers:

Fixtures + generator (Excel as ground truth)
Fidelity harness (adapters + scoring + diagnostics + semantic workbook diff)
External oracle helpers (optional subprocess validators/generators)
Performance harness (throughput workloads + best-effort memory)
Rendering + publishing (markdown/csv + HTML dashboard + plots)
Optional Rust acceleration (PyO3 extension + Rust-backed adapters)

Dependency Direction (No Cycles)

One-way dependencies only:

models.py defines the core contracts.
Adapters depend on models.
Runners depend on adapters and models.
Renderers depend on runner output schemas.

In practice:

models
  ^
adapters
  ^
harness runner (fidelity)
  ^
results renderer / visualizations

perf runner (performance) -> perf renderer

Rust extension (optional) is called by Rust-backed adapters; it should not depend on Python code.

WolfXL is an external PyPI dependency — it does not depend on ExcelBench.

Rule of thumb: keep adapters thin and deterministic. Any cross-library normalization should live in runner utilities, not in adapter implementations.

Repo Map (Where Things Live)

Most-touched top-level directories:

src/excelbench/
- cli.py: Typer CLI entrypoint (excelbench ...)
- models.py: dataclasses/contracts (CellValue, CellFormat, BorderInfo, ...)
- generator/: fixture generation (xlwings + Excel)
- harness/: fidelity benchmark runner, adapters, and optional external oracle helpers
- harness/workbook_snapshot.py: normalized workbook/package snapshots used by semantic diffs
- harness/semantic_diff.py: structured workbook diff artifacts for diagnostics/context lanes
- harness/roundtrip_runner.py: open/save idempotence context lane
- harness/compat_cases.py: openpyxl-style compatibility snippet lane
- harness/artifact_context.py: chart and macro package evidence lanes
- harness/external_fixture_specs/: tool-specific external oracle fixture definitions imported by the fixture-pack generator
- perf/: performance runner + renderer
- results/: fidelity result renderers (md/csv) + dashboards/plots
fixtures/
- excel/: canonical .xlsx fixtures (git-tracked, Excel-generated)
- excel_xls/: canonical .xls fixtures
- throughput_xlsx/: scale fixtures for perf/throughput workloads
tools/external-oracles/
- Optional subprocess helpers for non-Python oracle tools.
- excelize/: Go helper that generates and inspects .xlsx fixtures with Excelize. Run from that directory with go run ..
- libreoffice/: Python helper that runs LibreOffice headless open/save and PDF render validation.
- exceljs/: Node helper that generates workbooks with ExcelJS tables, formulas, data validations, rich strings, comments, hyperlinks, images, merges, freeze panes, and sheet protection.
- apache-poi/: Java helper that generates workbooks with Apache POI tables, formulas, data validations, rich strings, comments, hyperlinks, images, merges, freeze panes, and sheet protection.
- closedxml/: .NET helper that generates workbooks with ClosedXML tables, pivots, and conditional formatting.
- npoi/: .NET helper that generates POI-style workbooks with NPOI formulas, comments, rich text, merged ranges, and sheet protection.
rust/excelbench_rust/ (optional, local-only)
- PyO3 crate for ExcelBench-specific Rust backends (umya-spreadsheet, basic calamine)
- The core WolfXL backends (calamine-styled, rust_xlsxwriter, xlsx patcher) are now in the standalone wolfxl package on PyPI (pip install wolfxl)
WolfXL (external dependency, pip install wolfxl)
- Standalone repo: https://github.com/SynthGL/wolfxl
- Openpyxl-compatible API: load_workbook, Workbook, Font, PatternFill, etc.
- Three modes: read (calamine-styles), write (rust_xlsxwriter), modify (XlsxPatcher)
- Installed as optional dependency: uv sync --extra rust
tests/: pytest suites (fidelity + adapter unit tests + visualization smoke tests)
docs/: plans and trackers (treat as source of truth for methodology and run logs)

Important "scratch" conventions:

test_files/ is local scratch (gitignored).
results_dev_* directories are local/ephemeral benchmark outputs (often gitignored).

Key Entry Points (Jump List)

Common starting points by intent:

Add a new adapter:
- src/excelbench/harness/adapters/base.py
- src/excelbench/harness/adapters/__init__.py
Add a new external oracle:
- src/excelbench/harness/external_oracles.py
- docs/trackers/external-oracle-expansion.md
- Keep helpers optional and subprocess-isolated until promoted into normal benchmark flows.
Add a new scored feature:
- Generator: src/excelbench/generator/features/
- Harness exercise/scoring: src/excelbench/harness/runner.py
Extend Tier 2/3 OOXML parsing:
- WolfXL (external): calamine-styled + ooxml utilities live in wolfxl repo
- ExcelBench-local Rust: rust/excelbench_rust/src/ (basic calamine, umya bindings)
Performance track:
- Runner: src/excelbench/perf/runner.py
- Renderer: src/excelbench/perf/renderer.py
- Throughput driver: scripts/run_throughput_dashboard.py
- Memory profiling: scripts/memory_profile.py
Dashboards/plots:
- HTML dashboard: src/excelbench/results/html_dashboard.py
- Scatter plots: src/excelbench/results/scatter.py

Main Flows

1) Fixture generation (ground truth)

xlwings -> Excel
  -> writes feature workbooks
  -> writes manifest.json
  -> fixtures committed to git

Command: uv run excelbench generate --output fixtures/excel

2) Fidelity benchmark (correctness)

fixtures + manifest
  -> runner loads adapters
  -> adapters read/write
  -> oracle verification (Excel via xlwings; fallback openpyxl)
  -> results.json + diagnostics
  -> renderers produce README.md/matrix.csv/plots

Command: uv run excelbench benchmark --tests fixtures/excel --output results

3) External oracle pass (optional pre-release hardening)

JSON fixture manifest
  -> external_oracles.py launches helper subprocess
  -> Excelize / LibreOffice / Apache POI / ClosedXML helper creates or validates workbook
  -> helper prints JSON diagnostics
  -> local-only results are reviewed before promoting stable cases to fixtures
  -> optional WolfXL validation script checks read + in-place modify-save preservation

External oracle helpers are intentionally not part of get_all_adapters() yet. Missing Go/Java/.NET/LibreOffice commands should skip cleanly, not fail the core suite. The first local pack is generated by scripts/generate_external_oracle_fixtures.py into results_dev_external/ and validated against WolfXL with scripts/validate_external_oracle_fixtures_with_wolfxl.py.

3a) Semantic diff and context lanes

workbook(s)
  -> workbook_snapshot.py normalizes workbook semantics and selected OOXML parts
  -> semantic_diff.py compares category-level snapshots
  -> diff-workbooks / roundtrip-context / compatibility-context / chart and macro context commands
  -> JSON + markdown artifacts with explicit skips for unsupported adapters

Commands:

uv run excelbench diff-workbooks --left a.xlsx --right b.xlsx --output results-workbook-diff
uv run excelbench roundtrip-context --tests fixtures/excel --output results-roundtrip
uv run excelbench compatibility-context --output results-compatibility
uv run excelbench cross-language-chart-context --output results-cross-language-charts
uv run excelbench macro-context --tests fixtures/excel_xlsm --output results-macros

These are context/evidence lanes. They do not change the core fidelity scoring model unless a future decision explicitly promotes their outputs into scored benchmark features.

4) Performance benchmark (speed/memory)

fixtures + throughput fixtures
  -> perf runner executes workloads (no oracle)
  -> wall/cpu/rss (and optional phase breakdown)
  -> perf/results.json
  -> perf renderer produces markdown/csv

Command: uv run excelbench perf --tests fixtures/excel --output results

5) Publishing

Local: uv run excelbench html, uv run excelbench scatter, uv run excelbench heatmap
CI: .github/workflows/deploy-dashboard.yml auto-builds and deploys the HTML dashboard to Vercel

Updating This Map

Update architecture.md when:

A new top-level module/directory is introduced.
Dependency direction changes (new allowed imports / new shared utilities).
A new CLI command or major runner mode is added.
A new dashboard/output becomes a supported interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Map (Read This Before Editing)

Big Picture

Core Layers

Dependency Direction (No Cycles)

Repo Map (Where Things Live)

Key Entry Points (Jump List)

Main Flows

1) Fixture generation (ground truth)

2) Fidelity benchmark (correctness)

3) External oracle pass (optional pre-release hardening)

3a) Semantic diff and context lanes

4) Performance benchmark (speed/memory)

5) Publishing

Updating This Map

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture Map (Read This Before Editing)

Big Picture

Core Layers

Dependency Direction (No Cycles)

Repo Map (Where Things Live)

Key Entry Points (Jump List)

Main Flows

1) Fixture generation (ground truth)

2) Fidelity benchmark (correctness)

3) External oracle pass (optional pre-release hardening)

3a) Semantic diff and context lanes

4) Performance benchmark (speed/memory)

5) Publishing

Updating This Map