Sun Tzu: The Unfought Battle

A turn-based strategy game built as a headless Python/Flask API, designed as a benchmark for measuring strategic reasoning in LLM-based agents under incomplete information.

Each player deploys 5 forces with hidden power values (1-5, each used exactly once). Power 1 is the Sovereign — lose it, lose the game. Every force looks identical to the opponent. The game is about information asymmetry: what you know, what you don't, and what you can make your opponent believe.

Getting Started

# Create a virtual environment
python -m venv .venv && source .venv/bin/activate

# Install production dependencies
pip install -r requirements.txt

# Install dev dependencies (testing, linting)
pip install -e ".[dev]"

# Run the API server
python app.py

# Play via CLI
python play_cli.py

Running Tests

# All tests
pytest

# Unit tests only (~2s)
pytest tests/test_models.py tests/test_state.py tests/test_orders.py \
       tests/test_resolution.py tests/test_upkeep.py tests/test_map_gen.py

# Benchmark tests (~30s)
pytest tests/test_benchmark.py

# Gameplay tests (~3min)
pytest tests/test_gameplay.py

Linting

ruff check .        # lint
ruff format .       # format

Game Mechanics

Orders

Order	Cost	Effect
Move	0 Shih	Adjacent hex movement
Charge	2 Shih	Move up to 2 hexes, +2 attack bonus
Scout	2 Shih	Noisy intel on one enemy within 2 hexes
Fortify	2 Shih	+2 defense this turn
Ambush	3 Shih	+2 defense, hidden from opponent

All orders except Move require supply chain to the Sovereign.

Supply Chain

Forces must chain back to the Sovereign (max 2 hops, range 2) to use special orders. Broken supply = Move only.

Combat

effective_power = base_power + modifiers + random(+-2). Both power values are permanently revealed after combat. Losers retreat if power difference <= 2, otherwise eliminated.

The Noose

Every 5 turns, the outermost ring of hexes becomes Scorched. Forces on Scorched hexes die.

Victory Conditions

Sovereign Capture — destroy the enemy's power-1 force
Domination — hold 2+ Contentious hexes for 4 consecutive turns
Elimination — destroy all enemy forces

Project Structure

app.py                  Flask REST API
models.py               Force, Player, Hex data classes
state.py                Game state management, fog of war
orders.py               Order processing, noisy scouting
resolution.py           Combat resolution with variance
upkeep.py               Turn finalization, Noose, victory conditions
map_gen.py              7x7 hex grid generation
config.json             Tunable game parameters
play_cli.py             CLI play mode
benchmark/
  telemetry.py          AgentReport, EventLog, BeliefState schemas
  metrics.py            Brier score, log loss, calibration, ToM delta
  llm_agent_interface.py  LLM agent abstract class + MockLLMAgent
tests/
  conftest.py           Shared fixtures and helpers
  simulate.py           Game simulation + Tier 1 strategies
  strategies_advanced.py  Tiers 2-4 strategies
  test_*.py             Unit and integration tests

Benchmark

The benchmark instrumentation measures LLM strategic reasoning via:

Noisy scouting: Scout reveals exact power with probability 0.7, otherwise returns a band (low 1-2, mid 3, high 4-5)
Per-turn telemetry: Belief distributions, action predictions, confidence
Metrics: Brier score, log loss, calibration error, information gain, theory-of-mind delta
LLM agent interface: Abstract LLMAgent class + MockLLMAgent for testing the harness without API calls

Strategy Tiers

Tier	Strategy	Approach
1	Aggressive	Charge-first sovereign rush
1	Cautious	Scout first, attack with intel advantage
1	Ambush	Set traps, fortify, wait for attackers
1	Blitzer	Charge-first blitz
1	Sovereign Hunter	Prioritize finding and killing the Sovereign
1	Coordinator	Maintain formation for support bonuses
1	Noose Dodger	Stay ahead of the shrinking board
2	Pattern Reader	Track enemy movement patterns
2	Supply Cutter	Break enemy supply chains
3	Bayesian Hunter	Bayesian inference over hidden powers
4	Lookahead	Forward simulation across belief states

Deployment

Deployed to Google Cloud App Engine. See app.yaml for configuration.

gcloud app deploy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sun Tzu: The Unfought Battle

Getting Started

Running Tests

Linting

Game Mechanics

Orders

Supply Chain

Combat

The Noose

Victory Conditions

Project Structure

Benchmark

Strategy Tiers

Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
benchmark		benchmark
tests		tests
.gcloudignore		.gcloudignore
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
app.py		app.py
app.yaml		app.yaml
config.json		config.json
map_gen.py		map_gen.py
models.py		models.py
orders.py		orders.py
play_cli.py		play_cli.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
resolution.py		resolution.py
state.py		state.py
upkeep.py		upkeep.py

Folders and files

Latest commit

History

Repository files navigation

Sun Tzu: The Unfought Battle

Getting Started

Running Tests

Linting

Game Mechanics

Orders

Supply Chain

Combat

The Noose

Victory Conditions

Project Structure

Benchmark

Strategy Tiers

Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages