Tycoon Learning Environment (TycoonLE) is a reinforcement learning environment for economically grounded, long-horizon planning. Agents operate in a simulated logistics economy where they allocate capital, build transport routes, move cargo, manage debt, and optimize delayed returns.
It is designed to study action legality, candidate-frontier decision interfaces, financing timing, delayed rewards, procedural variation, and replayable audit traces.
TycoonLE uses a fixed-shape interface. Agents choose among valid route, finance, and wait candidates, making rollouts compatible with JAX transformations such as jit, vmap, and scan.
The replay UI makes policies inspectable through route choices, cargo flow, financing behavior, reward, score, and profit over time.
TycoonBench provides a companion benchmark report for comparing agent and model performance on TycoonLE planning tasks: vrtnis.github.io/tycoonbench.
Use Python 3.11 or 3.12:
py -3.12 -m venv .venv
.\.venv\Scripts\python.exe -m pip install -e ".[test]"
npm installimport jax
from tycoonle_jax import TycoonLE
env = TycoonLE(split="dev", family="chain")
state, timestep = env.reset(jax.random.PRNGKey(0))
action = timestep.observation.action_mask.argmax()
state, timestep = env.step(state, action)Export a replay:
.\.venv\Scripts\python.exe examples\quickstart.py
npm run devOpen the browser UI and load runs/quickstart/replay.json.
Run tests:
.\.venv\Scripts\python.exe -m pytest
npm run buildInstall the notebook dependencies:
.\.venv\Scripts\python.exe -m pip install -e ".[test,notebooks]"Open a marimo notebook:
.\.venv\Scripts\python.exe -m marimo edit notebooks\01_quickstart_rollout.pyFor the read-only app view:
.\.venv\Scripts\python.exe -m marimo run notebooks\01_quickstart_rollout.pyThe molab badges open server-mode notebooks so TycoonLE can use JAX and the repo package. Molab may ask you to sign in before launching the server. The plain GitHub preview is read-only and may show source or dependency errors; the exact local app view is still marimo run.
The notebook set covers:
| Notebook | What it covers | Run on molab |
|---|---|---|
01_quickstart_rollout.py |
Interactive reset, rollout, OpenGFX sprite map render, candidate table, metrics, and replay export. | |
02_action_candidates_and_rewards.py |
Inspect candidate actions, execute a selected action, and compare reward/metric deltas with before/after sprite maps. | |
03_jax_rollouts_and_training_smoke.py |
Run JAX jit/vmap/scan rollouts and a tiny PPO smoke train. |
The benchmark report is generated from JSON artifacts in src/benchmark/generated/.
Create the fixed benchmark task files:
npm run benchmark:generatePreview OpenRouter requests without making API calls:
npm run benchmark:run:preview -- --models gpt-55 --tasks singleRouteRun configured OpenRouter models:
$env:OPENROUTER_API_KEY="sk-or-..."
npm run benchmark:run:openrouter -- --models gpt-55,gemini-35-flash --tasks singleRoute,chain
npm run benchmark:extract
npm run buildModel slugs live in benchmark/config/openrouter-models.mjs. Keep provider.allow_fallbacks disabled for benchmark runs unless the benchmark target is the router itself.
Run a small PPO smoke train:
.\.venv\Scripts\python.exe examples\train_ppo_jax.py --updates 1 --num-envs 4 --rollout-length 4 --update-epochs 1 --hidden-sizes 32If you find this work useful, consider citing:
@software{tycoonle,
title = {TycoonLE},
author = {TycoonLE contributors},
year = {2026},
url = {https://github.com/vrtnis/tycoon-learning-environment}
}TycoonLE uses sprite artwork from OpenGFX, an open-source graphics base set for OpenTTD.
