OpenAgora

Arena is an open-source rollout, verification, and trajectory plane for agentic reinforcement learning.

It provides the missing infrastructure layer between RL trainers (veRL, ROLL, TRL) and agent execution environments. Whether you are building a coding agent, a web agent, or a general-purpose autonomous system, Arena gives you a reproducible, observable, and RL-ready execution pipeline.

What is Arena?

Training agents with reinforcement learning requires more than just an LLM API. You need:

Controlled rollouts — deterministic sampling, token budgets, and trajectory capture
Sandboxed execution — safe, reproducible environments for your agents
Decoupled verification — reward computation independent from agent logic
Structured trajectory data — training-grade data for PPO, GRPO, DPO, and more

Arena provides all four as composable, language-agnostic planes.

Four Planes

Plane	Purpose	Status
Rollout Control Plane	LLM proxy with sampling injection and trajectory capture	✅ Available
Sandbox Plane	Containerized agent execution (Docker v1)	✅ Available
Verification Plane	Structured SWE-bench verification + multi-language parsers	✅ Available
Trajectory Data Plane	Structured, append-only trajectory storage	✅ Available

See docs/architecture.md for the full design.

Quick Start

Get your first rollout running in under 5 minutes.

Prerequisites

Docker
Go 1.25+
Python 3.10+
uv (for Python development)

1. Clone and Build

git clone https://github.com/albert-lv/OpenAgora.git
cd OpenAgora
make build

2. Start the Arena Server

./bin/openagora-server
# Server listening on :9090

Note: The quickstart uses the Docker sandbox provider by default. Make sure Docker is installed and running before proceeding. If you do not have Docker, you can start the server with a mock sandbox instead:
./bin/openagora-server --sandbox=mock
The mock provider does not create real containers, but the rest of the flow (proxy, trajectory, verification) works normally.

Note on LLM backend: The default task.json points to a mock LLM. Arena supports Ollama, vLLM, and SGLang as inference backends. The proxy injects logprobs for all backends and top_logprobs for vLLM/SGLang. See docs/getting-started.md for backend setup instructions.

3. Run Your First Rollout

In another terminal:

cd examples/quickstart
./run.sh

You should see a rollout complete with captured trajectory steps and a reward.

For more details, check out examples/quickstart/README.md and docs/getting-started.md.

Demo: Code Colosseum Dashboard

For a complete end-to-end demo that shows live agent duels and a real GRPO training loop improving a model, run the Code Colosseum stack.

Key points:

The trainer starts an OpenAI-compatible LLM server that serves the current actor policy.
Every Arena rollout calls this server, so each GRPO update is immediately reflected in the next generation.
The Dashboard shows reward/loss curves improving over iterations in real time.

One-command demo

docker compose -f examples/code-colosseum/docker-compose.yml up --build

Then open http://localhost:3000. The first run downloads the configured model (default Qwen/Qwen2.5-0.5B-Instruct) into the mounted HuggingFace cache (~/.cache/huggingface).

To use a different model, edit MODEL_NAME in examples/code-colosseum/docker-compose.yml, e.g. Qwen/Qwen3.5-0.8B.

Local development

Run the services separately (useful when hacking on the UI, orchestrator, or trainer):

Install Python dependencies

cd examples/code-colosseum/backend
python3 -m venv .venv
source .venv/bin/activate
pip install -e ../../../python/openagora-sdk
pip install fastapi uvicorn pydantic

The trainer also needs torch, transformers, peft, fastapi, and uvicorn (install them in the same or another venv).

Start the Arena server
```
./bin/openagora-server
```

Start the Code Colosseum orchestrator

cd examples/code-colosseum
PROBLEMS_DIR=./problems TRAINING_METRICS_PATH=./backend/data/metrics.jsonl \
  uvicorn backend.main:app --host 0.0.0.0 --port 8080

Start the GRPO trainer / policy LLM server
```
cd examples/code-colosseum/training
python3 train_colosseum.py
```
The trainer starts an LLM backend on port 8000 and writes metrics to METRICS_PATH. The orchestrator serves them at /api/training/status.

Start the Dashboard

cd examples/code-colosseum/dashboard
npm install
npm run dev

Then open http://localhost:5173.

Dashboard tabs

🌌 指挥中心 — the epic Arena + GRPO command center: live duel, agent code, battle log, and GRPO reward distribution in one screen.
⚔️ Arena — pick a problem, start a duel between two agents, and watch the live battle with code panes and battle logs.
🏆 Leaderboard — Elo ratings and win/loss/draw records.
📈 Training — live GRPO reward/loss/KL curves and per-group reward distribution.

See examples/code-colosseum/README.md for the full demo guide.

Demo: Relationship Chat RL

A minimal end-to-end PPO example that teaches a small language model to reply to a partner's message in a more empathetic way. It uses:

Actor model: Qwen/Qwen3.5-0.8B (LoRA-tuned on CPU)
Rollout backend: qwen3.5:0.8b via Ollama
Sandbox: local (no extra Docker-in-Docker needed on macOS)
Verification: a simple rubric scorer that checks for required/avoided phrases

One-command demo

cd examples/relationship-chat-rl
docker compose up --build

The stack starts Ollama, the Arena server, and the CPU trainer. The first run uses the HuggingFace cache mounted from ~/.cache/huggingface, so make sure Qwen/Qwen3.5-0.8B is pre-downloaded there.

What you will see

After the rollout and PPO update complete, open the Arena Dashboard at http://localhost:9091:

Rollouts	Verify Stats	Token Stats

The trainer writes metrics to examples/relationship-chat-rl/data/metrics.jsonl and saves the LoRA checkpoint to examples/relationship-chat-rl/checkpoints/checkpoint-1/.

See examples/relationship-chat-rl/README.md for the full guide.

Why Arena?

Capability	Arena	ROCK	LiteLLM	E2B	SWE-Gym
LLM Proxy with active control	✅	❌	passive	❌	❌
Sampling injection per rollout	✅	❌	❌	❌	❌
Independent verification plane	✅	❌	❌	❌	coupled
RL-grade trajectory schema	✅	❌	❌	❌	❌
Language-agnostic agent contract	✅	partial	N/A	partial	partial

Project Structure

OpenAgora/
├── go/                      # Go core (server, proxy, sandbox orchestration)
│   ├── cmd/                 # Binaries (openagora-server, demo)
│   └── pkg/                 # Reusable packages
├── proto/                   # Protobuf / gRPC schemas
├── python/                  # Python ecosystem
│   ├── openagora-sdk/           # Python client for Arena
│   ├── openagora-verify/        # Verification plugins
│   └── openagora-verl/          # veRL trainer adapter
├── docker/                  # Docker images
├── docs/                    # Documentation
├── examples/                # Quickstart and trainer integrations
├── Makefile                 # Common development tasks
└── README.md                # You are here

Installation

Go Server

make build
# Output: ./bin/openagora-server

Python SDK

cd python/openagora-sdk
uv sync

Docker Images

make docker-server    # openagora-server:latest
make docker-agent     # openagora-agent-minimal:latest

Usage Examples

Build a Custom Agent

Any container that follows the Sandbox Contract can run in Arena. The contract is simple:

Read the task from /sandbox/.arena/task.json
Route LLM calls through the OPENAI_BASE_URL injected by Arena
Signal completion by writing /sandbox/.arena/done

That is it — language-agnostic and framework-agnostic.

Python Client

from openagora_sdk.client import ArenaClient

client = ArenaClient("localhost:9090")

rollout_id = client.create_rollout(
    task_id="my-task",
    image="openagora-agent-minimal:latest",
    llm_backend="http://localhost:8000/v1",
)

result = client.wait(rollout_id)
print(f"Status: {result['status']}, Reward: {result['reward']}")

More examples live in examples/.

Roadmap

We are building Arena in public. Here is what is coming next:

Additional sandbox providers (E2B, OpenSandbox)
Parquet and S3 trajectory backends
Streaming trajectory consumption for online RL
Structured SWE-bench style verification
LLM-as-judge verification
Distributed rollout workers
Observability dashboards

Have an idea? Open a discussion or issue.

Contributing

We love contributions! Please read our Contributing Guide to get started.

A few quick ways to help:

Report bugs — open an issue
Request features — open an issue
Submit improvements — open a pull request
Spread the word — star the repo and share with others

Please note that this project is released with a Contributor Code of Conduct. By participating, you agree to abide by its terms.

Community

💬 GitHub Discussions — ask questions, share ideas
🐛 GitHub Issues — bug reports and feature requests
📧 For security issues, please email the maintainers directly instead of opening a public issue

License

OpenAgora is licensed under the Apache License 2.0.

Built with ❤️ for the open agentic RL community.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github		.github
docker		docker
docs		docs
examples		examples
go		go
proto/openagora/v1		proto/openagora/v1
python		python
screenshots		screenshots
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAgora

What is Arena?

Four Planes

Quick Start

Prerequisites

1. Clone and Build

2. Start the Arena Server

3. Run Your First Rollout

Demo: Code Colosseum Dashboard

One-command demo

Local development

Dashboard tabs

Demo: Relationship Chat RL

One-command demo

What you will see

Why Arena?

Project Structure

Installation

Go Server

Python SDK

Docker Images

Usage Examples

Build a Custom Agent

Python Client

Roadmap

Contributing

Community

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenAgora

What is Arena?

Four Planes

Quick Start

Prerequisites

1. Clone and Build

2. Start the Arena Server

3. Run Your First Rollout

Demo: Code Colosseum Dashboard

One-command demo

Local development

Dashboard tabs

Demo: Relationship Chat RL

One-command demo

What you will see

Why Arena?

Project Structure

Installation

Go Server

Python SDK

Docker Images

Usage Examples

Build a Custom Agent

Python Client

Roadmap

Contributing

Community

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages