Skip to content

Latest commit

 

History

History
244 lines (183 loc) · 9.38 KB

File metadata and controls

244 lines (183 loc) · 9.38 KB

Architecture Map (Read This Before Editing)

This file is the high-level map of the ExcelBench codebase: what lives where, the allowed dependency direction, and the main runtime/data flows.

If you are starting a new session:

  1. Read this file (architecture.md) to orient.
  2. Read CLAUDE.md for commands, workflows, and repo conventions.
  3. Check active trackers under docs/trackers/ for current status and run history.

Big Picture

ExcelBench has two complementary tracks:

  • Fidelity (correctness): "Does this library preserve Excel semantics for feature X?"
  • Performance (speed/memory): "How fast is it at feature X or a scaled workload?"

A key design principle is reproducibility:

  • Canonical fixtures are generated by real Excel and committed to git.
  • Benchmarks produce JSON results as the source of truth, and render all other views from it.

Core Layers

At a high level, ExcelBench is split into six layers:

  1. Fixtures + generator (Excel as ground truth)
  2. Fidelity harness (adapters + scoring + diagnostics + semantic workbook diff)
  3. External oracle helpers (optional subprocess validators/generators)
  4. Performance harness (throughput workloads + best-effort memory)
  5. Rendering + publishing (markdown/csv + HTML dashboard + plots)
  6. Optional Rust acceleration (PyO3 extension + Rust-backed adapters)

Dependency Direction (No Cycles)

One-way dependencies only:

  • models.py defines the core contracts.
  • Adapters depend on models.
  • Runners depend on adapters and models.
  • Renderers depend on runner output schemas.

In practice:

models
  ^
adapters
  ^
harness runner (fidelity)
  ^
results renderer / visualizations

perf runner (performance) -> perf renderer

Rust extension (optional) is called by Rust-backed adapters; it should not depend on Python code.

WolfXL is an external PyPI dependency — it does not depend on ExcelBench.

Rule of thumb: keep adapters thin and deterministic. Any cross-library normalization should live in runner utilities, not in adapter implementations.

Repo Map (Where Things Live)

Most-touched top-level directories:

  • src/excelbench/

    • cli.py: Typer CLI entrypoint (excelbench ...)
    • models.py: dataclasses/contracts (CellValue, CellFormat, BorderInfo, ...)
    • generator/: fixture generation (xlwings + Excel)
    • harness/: fidelity benchmark runner, adapters, and optional external oracle helpers
    • harness/workbook_snapshot.py: normalized workbook/package snapshots used by semantic diffs
    • harness/semantic_diff.py: structured workbook diff artifacts for diagnostics/context lanes
    • harness/roundtrip_runner.py: open/save idempotence context lane
    • harness/compat_cases.py: openpyxl-style compatibility snippet lane
    • harness/artifact_context.py: chart and macro package evidence lanes
    • harness/external_fixture_specs/: tool-specific external oracle fixture definitions imported by the fixture-pack generator
    • perf/: performance runner + renderer
    • results/: fidelity result renderers (md/csv) + dashboards/plots
  • fixtures/

    • excel/: canonical .xlsx fixtures (git-tracked, Excel-generated)
    • excel_xls/: canonical .xls fixtures
    • throughput_xlsx/: scale fixtures for perf/throughput workloads
  • tools/external-oracles/

    • Optional subprocess helpers for non-Python oracle tools.
    • excelize/: Go helper that generates and inspects .xlsx fixtures with Excelize. Run from that directory with go run ..
    • libreoffice/: Python helper that runs LibreOffice headless open/save and PDF render validation.
    • exceljs/: Node helper that generates workbooks with ExcelJS tables, formulas, data validations, rich strings, comments, hyperlinks, images, merges, freeze panes, and sheet protection.
    • apache-poi/: Java helper that generates workbooks with Apache POI tables, formulas, data validations, rich strings, comments, hyperlinks, images, merges, freeze panes, and sheet protection.
    • closedxml/: .NET helper that generates workbooks with ClosedXML tables, pivots, and conditional formatting.
    • npoi/: .NET helper that generates POI-style workbooks with NPOI formulas, comments, rich text, merged ranges, and sheet protection.
  • rust/excelbench_rust/ (optional, local-only)

    • PyO3 crate for ExcelBench-specific Rust backends (umya-spreadsheet, basic calamine)
    • The core WolfXL backends (calamine-styled, rust_xlsxwriter, xlsx patcher) are now in the standalone wolfxl package on PyPI (pip install wolfxl)
  • WolfXL (external dependency, pip install wolfxl)

    • Standalone repo: https://github.com/SynthGL/wolfxl
    • Openpyxl-compatible API: load_workbook, Workbook, Font, PatternFill, etc.
    • Three modes: read (calamine-styles), write (rust_xlsxwriter), modify (XlsxPatcher)
    • Installed as optional dependency: uv sync --extra rust
  • tests/: pytest suites (fidelity + adapter unit tests + visualization smoke tests)

  • docs/: plans and trackers (treat as source of truth for methodology and run logs)

Important "scratch" conventions:

  • test_files/ is local scratch (gitignored).
  • results_dev_* directories are local/ephemeral benchmark outputs (often gitignored).

Key Entry Points (Jump List)

Common starting points by intent:

  • Add a new adapter:

    • src/excelbench/harness/adapters/base.py
    • src/excelbench/harness/adapters/__init__.py
  • Add a new external oracle:

    • src/excelbench/harness/external_oracles.py
    • docs/trackers/external-oracle-expansion.md
    • Keep helpers optional and subprocess-isolated until promoted into normal benchmark flows.
  • Add a new scored feature:

    • Generator: src/excelbench/generator/features/
    • Harness exercise/scoring: src/excelbench/harness/runner.py
  • Extend Tier 2/3 OOXML parsing:

    • WolfXL (external): calamine-styled + ooxml utilities live in wolfxl repo
    • ExcelBench-local Rust: rust/excelbench_rust/src/ (basic calamine, umya bindings)
  • Performance track:

    • Runner: src/excelbench/perf/runner.py
    • Renderer: src/excelbench/perf/renderer.py
    • Throughput driver: scripts/run_throughput_dashboard.py
    • Memory profiling: scripts/memory_profile.py
  • Dashboards/plots:

    • HTML dashboard: src/excelbench/results/html_dashboard.py
    • Scatter plots: src/excelbench/results/scatter.py

Main Flows

1) Fixture generation (ground truth)

xlwings -> Excel
  -> writes feature workbooks
  -> writes manifest.json
  -> fixtures committed to git

Command: uv run excelbench generate --output fixtures/excel

2) Fidelity benchmark (correctness)

fixtures + manifest
  -> runner loads adapters
  -> adapters read/write
  -> oracle verification (Excel via xlwings; fallback openpyxl)
  -> results.json + diagnostics
  -> renderers produce README.md/matrix.csv/plots

Command: uv run excelbench benchmark --tests fixtures/excel --output results

3) External oracle pass (optional pre-release hardening)

JSON fixture manifest
  -> external_oracles.py launches helper subprocess
  -> Excelize / LibreOffice / Apache POI / ClosedXML helper creates or validates workbook
  -> helper prints JSON diagnostics
  -> local-only results are reviewed before promoting stable cases to fixtures
  -> optional WolfXL validation script checks read + in-place modify-save preservation

External oracle helpers are intentionally not part of get_all_adapters() yet. Missing Go/Java/.NET/LibreOffice commands should skip cleanly, not fail the core suite. The first local pack is generated by scripts/generate_external_oracle_fixtures.py into results_dev_external/ and validated against WolfXL with scripts/validate_external_oracle_fixtures_with_wolfxl.py.

3a) Semantic diff and context lanes

workbook(s)
  -> workbook_snapshot.py normalizes workbook semantics and selected OOXML parts
  -> semantic_diff.py compares category-level snapshots
  -> diff-workbooks / roundtrip-context / compatibility-context / chart and macro context commands
  -> JSON + markdown artifacts with explicit skips for unsupported adapters

Commands:

  • uv run excelbench diff-workbooks --left a.xlsx --right b.xlsx --output results-workbook-diff
  • uv run excelbench roundtrip-context --tests fixtures/excel --output results-roundtrip
  • uv run excelbench compatibility-context --output results-compatibility
  • uv run excelbench cross-language-chart-context --output results-cross-language-charts
  • uv run excelbench macro-context --tests fixtures/excel_xlsm --output results-macros

These are context/evidence lanes. They do not change the core fidelity scoring model unless a future decision explicitly promotes their outputs into scored benchmark features.

4) Performance benchmark (speed/memory)

fixtures + throughput fixtures
  -> perf runner executes workloads (no oracle)
  -> wall/cpu/rss (and optional phase breakdown)
  -> perf/results.json
  -> perf renderer produces markdown/csv

Command: uv run excelbench perf --tests fixtures/excel --output results

5) Publishing

  • Local: uv run excelbench html, uv run excelbench scatter, uv run excelbench heatmap
  • CI: .github/workflows/deploy-dashboard.yml auto-builds and deploys the HTML dashboard to Vercel

Updating This Map

Update architecture.md when:

  • A new top-level module/directory is introduced.
  • Dependency direction changes (new allowed imports / new shared utilities).
  • A new CLI command or major runner mode is added.
  • A new dashboard/output becomes a supported interface.