A Claude Code plugin that brings MassGen's quality enforcement patterns to any project. Iterative quality loops where specialized agents evaluate, improve, and verify work products until they genuinely meet a high bar.
The /refine skill runs an autonomous quality loop: produce a deliverable,
evaluate it with a brutally honest critic agent, iterate until it's genuinely
good. Each round gets a fresh context to prevent approach anchoring.
/refine "Build a landing page for an AI startup"
That's it. The loop handles criteria generation, builder agents, regression guards, evaluation, and iteration automatically.
Sandboxes run Claude Code in an isolated microVM. With the recommended tmux launch helper, the refinement loop can auto-clear context between rounds and run unattended. Requires Docker Desktop 4.58+.
# 1. Clone the plugin
git clone https://github.com/massgen/massgen-refinery.git
# 2. Create the sandbox with your project + plugin mounted
docker sandbox run claude \
~/my-project \
/path/to/massgen-refinery \
-- --plugin-dir /path/to/massgen-refinery
# 3. Authenticate (one time — persists across sessions)
# When Claude Code starts, type:
/login
# Then Ctrl+C to exit.
# 4. Install uv + massgen + AI coding CLIs (one time — persists)
# Run as separate calls to avoid connection timeout on long installs
docker sandbox exec claude-my-project bash -c \
"curl -LsSf https://astral.sh/uv/install.sh | sh && \
~/.local/bin/uv tool install massgen"
docker sandbox exec claude-my-project bash -c \
"npm install -g @openai/codex @google/gemini-cli"
docker sandbox exec claude-my-project bash -c \
"curl -fsSL https://gh.io/copilot-install | bash"
# 5. Reconnect with the sandbox tmux launch helper
docker sandbox exec -it claude-my-project bash -lc \
"bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh"
# 6. Use it
/massgen-refinery:refine "Build a landing page for an AI model router"After setup, reconnect anytime with just step 5. Launching Claude without the tmux helper still works, but context clearing falls back to manual selection.
docker sandbox ls # List all sandboxes
docker sandbox exec -it claude-my-project bash # Shell access
docker sandbox rm claude-my-project # Delete sandbox# 1. Clone the plugin
git clone https://github.com/massgen/massgen-refinery.git
# 2. Run the macOS setup script (checks/installs tmux, uv, massgen via Homebrew)
bash massgen-refinery/scripts/setup-local-mac.sh
# 3. Set your API key (or use /login after launching)
export ANTHROPIC_API_KEY=sk-ant-api03-xxxxx
# 4. Launch with tmux helper (enables auto-clear between refinement rounds)
cd ~/my-project
bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh
# 5. Use it
/massgen-refinery:refine "Build a landing page for an AI model router"With eval backend CLIs (Codex, Gemini, Copilot):
bash massgen-refinery/scripts/setup-local-mac.sh --with-eval-backendsAPI keys from your host shell do not pass into the sandbox. Instead:
- Create the sandbox:
docker sandbox run claude ~/my-project - When Claude Code starts, type
/loginand follow the prompts - Press Ctrl+C to exit
- Reconnect — login persists across sessions
The OAuth shortcut (
CLAUDE_CODE_OAUTH_TOKEN/claude setup-token) does not reliably work with Docker sandboxes. Use/logininstead.
Codex
docker sandbox exec -it claude-my-project bash -c "codex login --device-auth"
# Prints a URL + code → open in browser, approve, done
docker sandbox exec -it claude-my-project bash -c "codex"Gemini — prints an OAuth URL; open it in your browser, paste the code back
docker sandbox exec -it claude-my-project bash -c "gemini"Copilot
COPILOT_TOKEN=$(security find-generic-password -s "copilot-cli" -w 2>/dev/null)
docker sandbox exec claude-my-project bash -c \
"echo 'export COPILOT_GITHUB_TOKEN=$COPILOT_TOKEN' >> ~/.bashrc"
docker sandbox exec -it claude-my-project bash -c "copilot"Set your API key in your shell:
export ANTHROPIC_API_KEY=sk-ant-api03-xxxxxFor image/video/audio generation and vision, set these via export or in a
.env file in your project root:
OPENAI_API_KEY=sk-... # Image gen, vision, audio
GOOGLE_API_KEY=AI... # Gemini vision, video, image
XAI_API_KEY=xai-... # Grok video gen
The plugin hooks detect keys from both environment variables and .env files.
This repo includes a pre-commit config with:
- basic hygiene checks (
trailing-whitespace,end-of-file-fixer, YAML validation, merge-conflict detection) detect-private-key- a local staged-diff scanner that blocks obvious API keys and tokens before commit
- a matching
pre-pushscanner that checks outgoing commits, so local snapshot branches or accidental later commits do not slip through push
Install it with:
uv tool install pre-commit
pre-commit install
pre-commit install --hook-type pre-pushRun it manually with:
pre-commit run --all-files
pre-commit run block-pushed-secrets --hook-stage pre-pushThe secret scanner only inspects added git diff lines and never prints secret values back to the terminal.
# Full quality refinement loop (main feature)
/massgen-refinery:refine "Build a landing page for an AI model router"
# With custom evaluation criteria
/massgen-refinery:refine "Build an API" --criteria "E1: REST conventions, E2: Error handling"
# Improve an existing deliverable
/massgen-refinery:refine "Rewrite the docs" --prior-answer ./docs/api.md
# Quick one-shot evaluation (no iteration)
/massgen-refinery:evaluate ./output.html
# Media skills
/massgen-refinery:read-media screenshot.png # Vision analysis
/massgen-refinery:image-generation "A sunset..." # Image generation
/massgen-refinery:video-generation "A timelapse..." # Video generation
/massgen-refinery:audio-generation "Narrate..." # Audio/TTS┌─────────────────────────────────────────────────────┐
│ /refine │
│ │
│ 1. Setup │
│ ├─ git init (if needed) │
│ ├─ init_session → timestamped session dir │
│ └─ generate_eval_criteria → 3-7 criteria │
│ │
│ 2. Answer Production (inner loop) │
│ ├─ Build/improve deliverable │
│ ├─ Spawn builder(s) in parallel for large work │
│ ├─ Output-first verification (read_media) │
│ └─ Build → verify → fix → re-verify → repeat │
│ │
│ 3. Pre-Submission Gate │
│ └─ regression-guard → pass/fail/mixed verdict │
│ │
│ 4. Submit │
│ └─ new_answer → snapshot deliverables │
│ │
│ 5. Evaluation (parallel) │
│ ├─ round-evaluator → verdict + critique + tasks │
│ └─ trace-analyzer → process report + scores │
│ │
│ 6. Synthesize │
│ ├─ Read evaluation results │
│ ├─ Save process learnings to memory │
│ └─ converged → done, iterate → continue │
│ │
│ 7. Context Boundary (if iterating) │
│ └─ Plan mode → next-round handoff │
│ └─ Next iteration starts fresh │
└─────────────────────────────────────────────────────┘
| Agent | Purpose | When Used |
|---|---|---|
| round-evaluator | Brutally honest critique + structured improvement spec | After each submission |
| execution-trace-analyzer | Process optimization (errors, effort, tools) | Parallel with evaluator |
| builder | Large artifact generation from prescriptive spec | During answer production |
| regression-guard | Verify improvements don't lose prior strengths | Before submission |
| media-worker | Background media generation/analysis | When media ops would block |
Provided by the massgen package:
- quality-tools:
init_session,generate_eval_criteria,submit_checklist,propose_improvements,reset_evaluation - workflow-tools:
new_answer,vote - media-tools:
generate_media,read_media
- Output-first verification: Run, interact, verify — don't just write
- Default to iterate: First drafts are 6s, not 8s
- Rebuild, don't patch: Use existing work as reference, not starting point
- Breakthrough amplification: Spread techniques that work to weaker components
- No feature accumulation: Features on mediocre foundation = mediocre result
Between refinement rounds, the plugin automatically clears context to prevent
approach anchoring and token bloat. When the evaluator's verdict is "iterate,"
Claude finishes a handoff plan and shows the plan approval dialog. When Claude
is launched through the recommended sandbox tmux helper, a lightweight watcher
polls the Claude pane and presses Enter when that dialog appears with the
default "Yes, clear context" option selected.
Requires tmux inside the sandbox and the tmux helper launch command for automatic clearing. Without that watcher, the plan approval UI appears and you select option 1 manually.
This technique is adapted from the VNX Orchestration System's context rotation pattern, which uses tmux keystroke injection to manage context boundaries in multi-agent workflows.
Status: Coming soon. The single-agent
/refineloop above is the main supported workflow. Multi-agent evaluation via MassGen is experimental.
The /refine skill supports opt-in multi-model evaluation where multiple LLM
backends (GPT, Gemini, Claude, Grok) evaluate your deliverable in parallel and
reach consensus:
/massgen-refinery:refine "Build a landing page" \
--eval-backends openai/gpt-5.4 gemini/gemini-3-flash-previewOr run a standalone multi-agent session:
/massgen-refinery:massgen-run "Explain quantum computing" \
--models openai/gpt-5.4 gemini/gemini-3-flash-preview/team-massgen-run: Multi-model orchestration via MassGen step mode. The
lead session launches each backend as a background process, tracks answers/votes
via a shared session directory, and synthesizes the consensus winner.
Agent Teams integration (experimental) is in progress — currently blocked by a known limitation where teammate background tasks get killed on idle (#17764, #28875, #29271). The current approach uses lead-managed background tasks instead.
Requires API keys for each backend. Falls back to the normal single-model round-evaluator if massgen or API keys are unavailable.
See skills/massgen-run/ for details.
For experimenting with a local massgen checkout instead of the published package.
Edit .claude-plugin/plugin.json to point MCP servers at your local massgen
using uv run --directory instead of uvx:
{
"quality-tools": {
"type": "stdio",
"command": "uv",
"args": ["run", "--directory", "/path/to/massgen", "python", "-m", "massgen.mcp_tools.standalone.quality_server"]
},
"workflow-tools": {
"type": "stdio",
"command": "uv",
"args": ["run", "--directory", "/path/to/massgen", "python", "-m", "massgen.mcp_tools.standalone.workflow_server"]
},
"media-tools": {
"type": "stdio",
"command": "uv",
"args": ["run", "--directory", "/path/to/massgen", "python", "-m", "massgen.mcp_tools.standalone.media_server"]
}
}Mount your project, massgen source, and the plugin — all read-write since
uv sync needs to create a venv in the massgen directory:
docker sandbox run claude \
~/my-project \
/path/to/massgen \
/path/to/massgen-refinerydocker sandbox exec -it claude-my-project bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc
# Sync massgen dependencies
cd /path/to/massgen && uv sync
exit# With tmux helper (recommended — enables auto-clear between rounds)
docker sandbox exec -it claude-my-project bash -lc \
"bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh"
# Without tmux (no auto-clear)
docker sandbox run claude-my-project -- \
--plugin-dir /path/to/massgen-refinery# With tmux helper (recommended)
bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh
# Without tmux
claude --plugin-dir /path/to/massgen-refineryReload after plugin changes without restarting:
/reload-plugins
MIT