Arena is an open-source rollout, verification, and trajectory plane for agentic reinforcement learning.
It provides the missing infrastructure layer between RL trainers (veRL, ROLL, TRL) and agent execution environments. Whether you are building a coding agent, a web agent, or a general-purpose autonomous system, Arena gives you a reproducible, observable, and RL-ready execution pipeline.
Training agents with reinforcement learning requires more than just an LLM API. You need:
- Controlled rollouts β deterministic sampling, token budgets, and trajectory capture
- Sandboxed execution β safe, reproducible environments for your agents
- Decoupled verification β reward computation independent from agent logic
- Structured trajectory data β training-grade data for PPO, GRPO, DPO, and more
Arena provides all four as composable, language-agnostic planes.
| Plane | Purpose | Status |
|---|---|---|
| Rollout Control Plane | LLM proxy with sampling injection and trajectory capture | β Available |
| Sandbox Plane | Containerized agent execution (Docker v1) | β Available |
| Verification Plane | Structured SWE-bench verification + multi-language parsers | β Available |
| Trajectory Data Plane | Structured, append-only trajectory storage | β Available |
See docs/architecture.md for the full design.
Get your first rollout running in under 5 minutes.
- Docker
- Go 1.25+
- Python 3.10+
- uv (for Python development)
git clone https://github.com/albert-lv/OpenAgora.git
cd OpenAgora
make build./bin/openagora-server
# Server listening on :9090Note: The quickstart uses the Docker sandbox provider by default. Make sure Docker is installed and running before proceeding. If you do not have Docker, you can start the server with a mock sandbox instead:
./bin/openagora-server --sandbox=mockThe mock provider does not create real containers, but the rest of the flow (proxy, trajectory, verification) works normally.
Note on LLM backend: The default
task.jsonpoints to a mock LLM. Arena supports Ollama, vLLM, and SGLang as inference backends. The proxy injectslogprobsfor all backends andtop_logprobsfor vLLM/SGLang. See docs/getting-started.md for backend setup instructions.
In another terminal:
cd examples/quickstart
./run.shYou should see a rollout complete with captured trajectory steps and a reward.
For more details, check out examples/quickstart/README.md and docs/getting-started.md.
For a complete end-to-end demo that shows live agent duels and a real GRPO training loop improving a model, run the Code Colosseum stack.
Key points:
- The trainer starts an OpenAI-compatible LLM server that serves the current actor policy.
- Every Arena rollout calls this server, so each GRPO update is immediately reflected in the next generation.
- The Dashboard shows reward/loss curves improving over iterations in real time.
docker compose -f examples/code-colosseum/docker-compose.yml up --buildThen open http://localhost:3000. The first run downloads the configured model (default Qwen/Qwen2.5-0.5B-Instruct) into the mounted HuggingFace cache (~/.cache/huggingface).
To use a different model, edit MODEL_NAME in examples/code-colosseum/docker-compose.yml, e.g. Qwen/Qwen3.5-0.8B.
Run the services separately (useful when hacking on the UI, orchestrator, or trainer):
-
Install Python dependencies
cd examples/code-colosseum/backend python3 -m venv .venv source .venv/bin/activate pip install -e ../../../python/openagora-sdk pip install fastapi uvicorn pydantic
The trainer also needs
torch,transformers,peft,fastapi, anduvicorn(install them in the same or another venv). -
Start the Arena server
./bin/openagora-server
-
Start the Code Colosseum orchestrator
cd examples/code-colosseum PROBLEMS_DIR=./problems TRAINING_METRICS_PATH=./backend/data/metrics.jsonl \ uvicorn backend.main:app --host 0.0.0.0 --port 8080 -
Start the GRPO trainer / policy LLM server
cd examples/code-colosseum/training python3 train_colosseum.pyThe trainer starts an LLM backend on port
8000and writes metrics toMETRICS_PATH. The orchestrator serves them at/api/training/status. -
Start the Dashboard
cd examples/code-colosseum/dashboard npm install npm run devThen open http://localhost:5173.
- π ζζ₯δΈεΏ β the epic Arena + GRPO command center: live duel, agent code, battle log, and GRPO reward distribution in one screen.
- βοΈ Arena β pick a problem, start a duel between two agents, and watch the live battle with code panes and battle logs.
- π Leaderboard β Elo ratings and win/loss/draw records.
- π Training β live GRPO reward/loss/KL curves and per-group reward distribution.
See examples/code-colosseum/README.md for the full demo guide.
A minimal end-to-end PPO example that teaches a small language model to reply to a partner's message in a more empathetic way. It uses:
- Actor model:
Qwen/Qwen3.5-0.8B(LoRA-tuned on CPU) - Rollout backend:
qwen3.5:0.8bvia Ollama - Sandbox: local (no extra Docker-in-Docker needed on macOS)
- Verification: a simple rubric scorer that checks for required/avoided phrases
cd examples/relationship-chat-rl
docker compose up --buildThe stack starts Ollama, the Arena server, and the CPU trainer. The first run uses the HuggingFace cache mounted from ~/.cache/huggingface, so make sure Qwen/Qwen3.5-0.8B is pre-downloaded there.
After the rollout and PPO update complete, open the Arena Dashboard at http://localhost:9091:
| Rollouts | Verify Stats | Token Stats |
|---|---|---|
![]() |
![]() |
![]() |
The trainer writes metrics to examples/relationship-chat-rl/data/metrics.jsonl and saves the LoRA checkpoint to examples/relationship-chat-rl/checkpoints/checkpoint-1/.
See examples/relationship-chat-rl/README.md for the full guide.
| Capability | Arena | ROCK | LiteLLM | E2B | SWE-Gym |
|---|---|---|---|---|---|
| LLM Proxy with active control | β | β | passive | β | β |
| Sampling injection per rollout | β | β | β | β | β |
| Independent verification plane | β | β | β | β | coupled |
| RL-grade trajectory schema | β | β | β | β | β |
| Language-agnostic agent contract | β | partial | N/A | partial | partial |
OpenAgora/
βββ go/ # Go core (server, proxy, sandbox orchestration)
β βββ cmd/ # Binaries (openagora-server, demo)
β βββ pkg/ # Reusable packages
βββ proto/ # Protobuf / gRPC schemas
βββ python/ # Python ecosystem
β βββ openagora-sdk/ # Python client for Arena
β βββ openagora-verify/ # Verification plugins
β βββ openagora-verl/ # veRL trainer adapter
βββ docker/ # Docker images
βββ docs/ # Documentation
βββ examples/ # Quickstart and trainer integrations
βββ Makefile # Common development tasks
βββ README.md # You are here
make build
# Output: ./bin/openagora-servercd python/openagora-sdk
uv syncmake docker-server # openagora-server:latest
make docker-agent # openagora-agent-minimal:latestAny container that follows the Sandbox Contract can run in Arena. The contract is simple:
- Read the task from
/sandbox/.arena/task.json - Route LLM calls through the
OPENAI_BASE_URLinjected by Arena - Signal completion by writing
/sandbox/.arena/done
That is it β language-agnostic and framework-agnostic.
from openagora_sdk.client import ArenaClient
client = ArenaClient("localhost:9090")
rollout_id = client.create_rollout(
task_id="my-task",
image="openagora-agent-minimal:latest",
llm_backend="http://localhost:8000/v1",
)
result = client.wait(rollout_id)
print(f"Status: {result['status']}, Reward: {result['reward']}")More examples live in examples/.
We are building Arena in public. Here is what is coming next:
- Additional sandbox providers (E2B, OpenSandbox)
- Parquet and S3 trajectory backends
- Streaming trajectory consumption for online RL
- Structured SWE-bench style verification
- LLM-as-judge verification
- Distributed rollout workers
- Observability dashboards
Have an idea? Open a discussion or issue.
We love contributions! Please read our Contributing Guide to get started.
A few quick ways to help:
- Report bugs β open an issue
- Request features β open an issue
- Submit improvements β open a pull request
- Spread the word β star the repo and share with others
Please note that this project is released with a Contributor Code of Conduct. By participating, you agree to abide by its terms.
- π¬ GitHub Discussions β ask questions, share ideas
- π GitHub Issues β bug reports and feature requests
- π§ For security issues, please email the maintainers directly instead of opening a public issue
OpenAgora is licensed under the Apache License 2.0.
Built with β€οΈ for the open agentic RL community.


