Skip to content

backtest: implement walk-forward optimization and grid search#13

Merged
marwinsteiner merged 7 commits intomainfrom
phase-4/optimize
Feb 22, 2026
Merged

backtest: implement walk-forward optimization and grid search#13
marwinsteiner merged 7 commits intomainfrom
phase-4/optimize

Conversation

@marwinsteiner
Copy link
Copy Markdown
Owner

Summary

Implements src/sysls/backtest/optimize.py with walk-forward analysis, grid search, and parameter optimization for the vectorized backtester.

  • ParameterGrid: Generates Cartesian product combinations from dict[str, list[Any]] via itertools.product. Iterable with __len__.
  • GridSearchResult / WalkForwardSplit / WalkForwardResult: Frozen Pydantic v2 models for optimization results with JSON serialization support.
  • grid_search(): Evaluates every parameter combination via run_vectorized_backtest, sorts by chosen metric (ascending for max_drawdown, descending otherwise). Validates metric against BacktestResult fields.
  • TimeSeriesSplit: Expanding-window time-series cross-validation with configurable train_ratio and n_splits. Last split absorbs integer division remainder.
  • walk_forward(): For each split, grid-searches training data then backtests OOS with best params. Concatenates and scales OOS equity curves for combined performance metrics.

Key design choices

  • Pure computation, no async code (as required by CLAUDE.md for vectorized backtest path)
  • BacktestResult imported at runtime (Pydantic needs it for field resolution)
  • run_vectorized_backtest and summarize_backtest imported at runtime in function bodies
  • structlog for logging, no print statements
  • All type hints, Google-style docstrings on public APIs

Commit discipline (7 atomic commits)

  1. Wireframe with stubs and type annotations
  2. ParameterGrid with Cartesian product iteration
  3. Pydantic models (GridSearchResult, WalkForwardSplit, WalkForwardResult)
  4. grid_search() with metric-based optimization
  5. TimeSeriesSplit expanding window helper
  6. walk_forward() with expanding window optimization
  7. Edge case tests and cleanup

Test plan

  • 37 tests in tests/backtest/test_optimize.py - all passing
  • Full suite: 688 passed, 10 skipped (Windows-only arcticdb skips)
  • ruff check src/ tests/ - all passed
  • ruff format --check src/ tests/ - all passed
  • ParameterGrid: basic grid, single/empty/multi-param, length, re-iteration
  • Pydantic models: construction, immutability, JSON round-trip
  • grid_search: basic search, metric sorting, max_drawdown ascending, invalid metric error
  • TimeSeriesSplit: expanding window, no overlap, coverage, invalid inputs
  • walk_forward: basic, OOS concatenation, combined metrics, equity continuity, single split
  • Edge cases: multi-param MA crossover, costs pass-through, 3-param grid, train_ratio validation

Module docstring, all imports, class/function stubs with
NotImplementedError, full type annotations, and Google-style
docstrings. Test file includes class stubs and helper functions.
Uses itertools.product over value lists. Supports __iter__ and
__len__. Empty grid yields a single empty dict. Includes 6 tests
covering basic grid, single param, empty grid, length, and
re-iteration.
Three frozen Pydantic v2 models: GridSearchResult (best_params,
best_score, all_results), WalkForwardSplit (split boundaries and
OOS result), WalkForwardResult (splits, combined equity, combined
metrics). Includes 5 tests: construction, immutability, and JSON
serialization round-trip.
Evaluates every ParameterGrid combination via run_vectorized_backtest.
Sorts results by chosen metric (descending for most, ascending for
max_drawdown). Validates metric against BacktestResult fields.
Includes 6 tests: basic search, best-params ordering, max_drawdown
ascending sort, single-param grid, all-results population, and
invalid metric error.
Generates expanding train/OOS index tuples for walk-forward analysis.
Training always starts at index 0 and grows by oos_step each split.
Last split absorbs any remainder from integer division. Validates
n_splits >= 1, train_ratio in (0, 1), and sufficient data length.
Includes 7 tests: basic splits, expanding window, no overlap,
length, single split, coverage, and too-short-data error.
For each split: grid_search on training data, then backtest OOS with
the best parameters. OOS equity curves are scaled and concatenated
to produce a combined performance estimate. Combined metrics computed
via summarize_backtest. Includes 5 tests: basic walk-forward, OOS
equity concatenation, combined metrics, split params variation, and
invalid n_splits error.
8 additional edge-case tests: multi-param grid search, total_return
metric, commission/slippage pass-through, 3-param grid, invalid
train_ratio, equity continuity across walk-forward splits, combined
metrics length, and single-split walk-forward. Applied ruff format.
Full suite: 688 passed, 10 skipped.
@marwinsteiner marwinsteiner merged commit 06fa4a9 into main Feb 22, 2026
1 check passed
@marwinsteiner marwinsteiner deleted the phase-4/optimize branch February 22, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant