massgen-refinery

A Claude Code plugin that brings MassGen's quality enforcement patterns to any project. Iterative quality loops where specialized agents evaluate, improve, and verify work products until they genuinely meet a high bar.

What It Does

The /refine skill runs an autonomous quality loop: produce a deliverable, evaluate it with a brutally honest critic agent, iterate until it's genuinely good. Each round gets a fresh context to prevent approach anchoring.

/refine "Build a landing page for an AI startup"

That's it. The loop handles criteria generation, builder agents, regression guards, evaluation, and iteration automatically.

Getting Started

Option A: Docker Sandbox (recommended)

Sandboxes run Claude Code in an isolated microVM. With the recommended tmux launch helper, the refinement loop can auto-clear context between rounds and run unattended. Requires Docker Desktop 4.58+.

# 1. Clone the plugin
git clone https://github.com/massgen/massgen-refinery.git

# 2. Create the sandbox with your project + plugin mounted
docker sandbox run claude \
  ~/my-project \
  /path/to/massgen-refinery \
  -- --plugin-dir /path/to/massgen-refinery

# 3. Authenticate (one time — persists across sessions)
#    When Claude Code starts, type:
/login
#    Then Ctrl+C to exit.

# 4. Install uv + massgen + AI coding CLIs (one time — persists)
#    Run as separate calls to avoid connection timeout on long installs
docker sandbox exec claude-my-project bash -c \
  "curl -LsSf https://astral.sh/uv/install.sh | sh && \
   ~/.local/bin/uv tool install massgen"

docker sandbox exec claude-my-project bash -c \
  "npm install -g @openai/codex @google/gemini-cli"

docker sandbox exec claude-my-project bash -c \
  "curl -fsSL https://gh.io/copilot-install | bash"

# 5. Reconnect with the sandbox tmux launch helper
docker sandbox exec -it claude-my-project bash -lc \
  "bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh"

# 6. Use it
/massgen-refinery:refine "Build a landing page for an AI model router"

After setup, reconnect anytime with just step 5. Launching Claude without the tmux helper still works, but context clearing falls back to manual selection.

docker sandbox ls                           # List all sandboxes
docker sandbox exec -it claude-my-project bash  # Shell access
docker sandbox rm claude-my-project         # Delete sandbox

Option B: Local on macOS (no Docker)

# 1. Clone the plugin
git clone https://github.com/massgen/massgen-refinery.git

# 2. Run the macOS setup script (checks/installs tmux, uv, massgen via Homebrew)
bash massgen-refinery/scripts/setup-local-mac.sh

# 3. Set your API key (or use /login after launching)
export ANTHROPIC_API_KEY=sk-ant-api03-xxxxx

# 4. Launch with tmux helper (enables auto-clear between refinement rounds)
cd ~/my-project
bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh

# 5. Use it
/massgen-refinery:refine "Build a landing page for an AI model router"

With eval backend CLIs (Codex, Gemini, Copilot):

bash massgen-refinery/scripts/setup-local-mac.sh --with-eval-backends

Authentication

Docker Sandbox

API keys from your host shell do not pass into the sandbox. Instead:

Create the sandbox: docker sandbox run claude ~/my-project
When Claude Code starts, type /login and follow the prompts
Press Ctrl+C to exit
Reconnect — login persists across sessions

The OAuth shortcut (CLAUDE_CODE_OAUTH_TOKEN / claude setup-token) does not reliably work with Docker sandboxes. Use /login instead.

Authenticating other CLIs (one time — persists in named sandbox)

Codex

docker sandbox exec -it claude-my-project bash -c "codex login --device-auth"
# Prints a URL + code → open in browser, approve, done

docker sandbox exec -it claude-my-project bash -c "codex"

Gemini — prints an OAuth URL; open it in your browser, paste the code back

docker sandbox exec -it claude-my-project bash -c "gemini"

Copilot

COPILOT_TOKEN=$(security find-generic-password -s "copilot-cli" -w 2>/dev/null)
docker sandbox exec claude-my-project bash -c \
  "echo 'export COPILOT_GITHUB_TOKEN=$COPILOT_TOKEN' >> ~/.bashrc"
docker sandbox exec -it claude-my-project bash -c "copilot"

Without Docker

Set your API key in your shell:

export ANTHROPIC_API_KEY=sk-ant-api03-xxxxx

Media tools (optional)

For image/video/audio generation and vision, set these via export or in a .env file in your project root:

OPENAI_API_KEY=sk-...        # Image gen, vision, audio
GOOGLE_API_KEY=AI...         # Gemini vision, video, image
XAI_API_KEY=xai-...          # Grok video gen

The plugin hooks detect keys from both environment variables and .env files.

Git Hooks

This repo includes a pre-commit config with:

basic hygiene checks (trailing-whitespace, end-of-file-fixer, YAML validation, merge-conflict detection)
detect-private-key
a local staged-diff scanner that blocks obvious API keys and tokens before commit
a matching pre-push scanner that checks outgoing commits, so local snapshot branches or accidental later commits do not slip through push

Install it with:

uv tool install pre-commit
pre-commit install
pre-commit install --hook-type pre-push

Run it manually with:

pre-commit run --all-files
pre-commit run block-pushed-secrets --hook-stage pre-push

The secret scanner only inspects added git diff lines and never prints secret values back to the terminal.

Skills

# Full quality refinement loop (main feature)
/massgen-refinery:refine "Build a landing page for an AI model router"

# With custom evaluation criteria
/massgen-refinery:refine "Build an API" --criteria "E1: REST conventions, E2: Error handling"

# Improve an existing deliverable
/massgen-refinery:refine "Rewrite the docs" --prior-answer ./docs/api.md

# Quick one-shot evaluation (no iteration)
/massgen-refinery:evaluate ./output.html

# Media skills
/massgen-refinery:read-media screenshot.png        # Vision analysis
/massgen-refinery:image-generation "A sunset..."    # Image generation
/massgen-refinery:video-generation "A timelapse..." # Video generation
/massgen-refinery:audio-generation "Narrate..."     # Audio/TTS

How `/refine` Works

┌─────────────────────────────────────────────────────┐
│                    /refine                           │
│                                                     │
│  1. Setup                                           │
│     ├─ git init (if needed)                         │
│     ├─ init_session → timestamped session dir       │
│     └─ generate_eval_criteria → 3-7 criteria        │
│                                                     │
│  2. Answer Production (inner loop)                  │
│     ├─ Build/improve deliverable                    │
│     ├─ Spawn builder(s) in parallel for large work  │
│     ├─ Output-first verification (read_media)       │
│     └─ Build → verify → fix → re-verify → repeat   │
│                                                     │
│  3. Pre-Submission Gate                             │
│     └─ regression-guard → pass/fail/mixed verdict   │
│                                                     │
│  4. Submit                                          │
│     └─ new_answer → snapshot deliverables           │
│                                                     │
│  5. Evaluation (parallel)                           │
│     ├─ round-evaluator → verdict + critique + tasks │
│     └─ trace-analyzer → process report + scores     │
│                                                     │
│  6. Synthesize                                      │
│     ├─ Read evaluation results                      │
│     ├─ Save process learnings to memory             │
│     └─ converged → done, iterate → continue         │
│                                                     │
│  7. Context Boundary (if iterating)                 │
│     └─ Plan mode → next-round handoff               │
│     └─ Next iteration starts fresh                  │
└─────────────────────────────────────────────────────┘

Agents

Agent	Purpose	When Used
round-evaluator	Brutally honest critique + structured improvement spec	After each submission
execution-trace-analyzer	Process optimization (errors, effort, tools)	Parallel with evaluator
builder	Large artifact generation from prescriptive spec	During answer production
regression-guard	Verify improvements don't lose prior strengths	Before submission
media-worker	Background media generation/analysis	When media ops would block

MCP Tools

Provided by the massgen package:

quality-tools: init_session, generate_eval_criteria, submit_checklist, propose_improvements, reset_evaluation
workflow-tools: new_answer, vote
media-tools: generate_media, read_media

Key Principles

Output-first verification: Run, interact, verify — don't just write
Default to iterate: First drafts are 6s, not 8s
Rebuild, don't patch: Use existing work as reference, not starting point
Breakthrough amplification: Spread techniques that work to weaker components
No feature accumulation: Features on mediocre foundation = mediocre result

Context Rotation

Between refinement rounds, the plugin automatically clears context to prevent approach anchoring and token bloat. When the evaluator's verdict is "iterate," Claude finishes a handoff plan and shows the plan approval dialog. When Claude is launched through the recommended sandbox tmux helper, a lightweight watcher polls the Claude pane and presses Enter when that dialog appears with the default "Yes, clear context" option selected.

Requires tmux inside the sandbox and the tmux helper launch command for automatic clearing. Without that watcher, the plan approval UI appears and you select option 1 manually.

This technique is adapted from the VNX Orchestration System's context rotation pattern, which uses tmux keystroke injection to manage context boundaries in multi-agent workflows.

Multi-Agent Evaluation (Experimental)

Status: Coming soon. The single-agent /refine loop above is the main supported workflow. Multi-agent evaluation via MassGen is experimental.

The /refine skill supports opt-in multi-model evaluation where multiple LLM backends (GPT, Gemini, Claude, Grok) evaluate your deliverable in parallel and reach consensus:

/massgen-refinery:refine "Build a landing page" \
  --eval-backends openai/gpt-5.4 gemini/gemini-3-flash-preview

Or run a standalone multi-agent session:

/massgen-refinery:massgen-run "Explain quantum computing" \
  --models openai/gpt-5.4 gemini/gemini-3-flash-preview

/team-massgen-run: Multi-model orchestration via MassGen step mode. The lead session launches each backend as a background process, tracks answers/votes via a shared session directory, and synthesizes the consensus winner.

Agent Teams integration (experimental) is in progress — currently blocked by a known limitation where teammate background tasks get killed on idle (#17764, #28875, #29271). The current approach uses lead-managed background tasks instead.

Requires API keys for each backend. Falls back to the normal single-model round-evaluator if massgen or API keys are unavailable.

See skills/massgen-run/ for details.

Development Setup (massgen from local source)

For experimenting with a local massgen checkout instead of the published package.

1. Update plugin.json for local source

Edit .claude-plugin/plugin.json to point MCP servers at your local massgen using uv run --directory instead of uvx:

{
  "quality-tools": {
    "type": "stdio",
    "command": "uv",
    "args": ["run", "--directory", "/path/to/massgen", "python", "-m", "massgen.mcp_tools.standalone.quality_server"]
  },
  "workflow-tools": {
    "type": "stdio",
    "command": "uv",
    "args": ["run", "--directory", "/path/to/massgen", "python", "-m", "massgen.mcp_tools.standalone.workflow_server"]
  },
  "media-tools": {
    "type": "stdio",
    "command": "uv",
    "args": ["run", "--directory", "/path/to/massgen", "python", "-m", "massgen.mcp_tools.standalone.media_server"]
  }
}

2. Create the sandbox (with local source mounted)

Mount your project, massgen source, and the plugin — all read-write since uv sync needs to create a venv in the massgen directory:

docker sandbox run claude \
  ~/my-project \
  /path/to/massgen \
  /path/to/massgen-refinery

3. Install uv and sync massgen (one time — persists)

docker sandbox exec -it claude-my-project bash

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc

# Sync massgen dependencies
cd /path/to/massgen && uv sync

exit

4. Launch with the plugin

# With tmux helper (recommended — enables auto-clear between rounds)
docker sandbox exec -it claude-my-project bash -lc \
  "bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh"

# Without tmux (no auto-clear)
docker sandbox run claude-my-project -- \
  --plugin-dir /path/to/massgen-refinery

Local dev without Docker

# With tmux helper (recommended)
bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh

# Without tmux
claude --plugin-dir /path/to/massgen-refinery

Reload after plugin changes without restarting:

/reload-plugins

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.claude-plugin		.claude-plugin
.claude/rules		.claude/rules
agents		agents
hooks		hooks
scripts		scripts
skills		skills
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

massgen-refinery

What It Does

Getting Started

Option A: Docker Sandbox (recommended)

Option B: Local on macOS (no Docker)

Authentication

Docker Sandbox

Authenticating other CLIs (one time — persists in named sandbox)

Without Docker

Media tools (optional)

Git Hooks

Skills

How `/refine` Works

Agents

MCP Tools

Key Principles

Context Rotation

Multi-Agent Evaluation (Experimental)

Development Setup (massgen from local source)

1. Update plugin.json for local source

2. Create the sandbox (with local source mounted)

3. Install uv and sync massgen (one time — persists)

4. Launch with the plugin

Local dev without Docker

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

massgen-refinery

What It Does

Getting Started

Option A: Docker Sandbox (recommended)

Option B: Local on macOS (no Docker)

Authentication

Docker Sandbox

Authenticating other CLIs (one time — persists in named sandbox)

Without Docker

Media tools (optional)

Git Hooks

Skills

How /refine Works

Agents

MCP Tools

Key Principles

Context Rotation

Multi-Agent Evaluation (Experimental)

Development Setup (massgen from local source)

1. Update plugin.json for local source

2. Create the sandbox (with local source mounted)

3. Install uv and sync massgen (one time — persists)

4. Launch with the plugin

Local dev without Docker

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

How `/refine` Works

Packages