Skip to content

massgen/massgen-refinery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

massgen-refinery

A Claude Code plugin that brings MassGen's quality enforcement patterns to any project. Iterative quality loops where specialized agents evaluate, improve, and verify work products until they genuinely meet a high bar.

What It Does

The /refine skill runs an autonomous quality loop: produce a deliverable, evaluate it with a brutally honest critic agent, iterate until it's genuinely good. Each round gets a fresh context to prevent approach anchoring.

/refine "Build a landing page for an AI startup"

That's it. The loop handles criteria generation, builder agents, regression guards, evaluation, and iteration automatically.


Getting Started

Option A: Docker Sandbox (recommended)

Sandboxes run Claude Code in an isolated microVM. With the recommended tmux launch helper, the refinement loop can auto-clear context between rounds and run unattended. Requires Docker Desktop 4.58+.

# 1. Clone the plugin
git clone https://github.com/massgen/massgen-refinery.git

# 2. Create the sandbox with your project + plugin mounted
docker sandbox run claude \
  ~/my-project \
  /path/to/massgen-refinery \
  -- --plugin-dir /path/to/massgen-refinery

# 3. Authenticate (one time — persists across sessions)
#    When Claude Code starts, type:
/login
#    Then Ctrl+C to exit.

# 4. Install uv + massgen + AI coding CLIs (one time — persists)
#    Run as separate calls to avoid connection timeout on long installs
docker sandbox exec claude-my-project bash -c \
  "curl -LsSf https://astral.sh/uv/install.sh | sh && \
   ~/.local/bin/uv tool install massgen"

docker sandbox exec claude-my-project bash -c \
  "npm install -g @openai/codex @google/gemini-cli"

docker sandbox exec claude-my-project bash -c \
  "curl -fsSL https://gh.io/copilot-install | bash"

# 5. Reconnect with the sandbox tmux launch helper
docker sandbox exec -it claude-my-project bash -lc \
  "bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh"

# 6. Use it
/massgen-refinery:refine "Build a landing page for an AI model router"

After setup, reconnect anytime with just step 5. Launching Claude without the tmux helper still works, but context clearing falls back to manual selection.

docker sandbox ls                           # List all sandboxes
docker sandbox exec -it claude-my-project bash  # Shell access
docker sandbox rm claude-my-project         # Delete sandbox

Option B: Local on macOS (no Docker)

# 1. Clone the plugin
git clone https://github.com/massgen/massgen-refinery.git

# 2. Run the macOS setup script (checks/installs tmux, uv, massgen via Homebrew)
bash massgen-refinery/scripts/setup-local-mac.sh

# 3. Set your API key (or use /login after launching)
export ANTHROPIC_API_KEY=sk-ant-api03-xxxxx

# 4. Launch with tmux helper (enables auto-clear between refinement rounds)
cd ~/my-project
bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh

# 5. Use it
/massgen-refinery:refine "Build a landing page for an AI model router"

With eval backend CLIs (Codex, Gemini, Copilot):

bash massgen-refinery/scripts/setup-local-mac.sh --with-eval-backends

Authentication

Docker Sandbox

API keys from your host shell do not pass into the sandbox. Instead:

  1. Create the sandbox: docker sandbox run claude ~/my-project
  2. When Claude Code starts, type /login and follow the prompts
  3. Press Ctrl+C to exit
  4. Reconnect — login persists across sessions

The OAuth shortcut (CLAUDE_CODE_OAUTH_TOKEN / claude setup-token) does not reliably work with Docker sandboxes. Use /login instead.

Authenticating other CLIs (one time — persists in named sandbox)

Codex

docker sandbox exec -it claude-my-project bash -c "codex login --device-auth"
# Prints a URL + code → open in browser, approve, done

docker sandbox exec -it claude-my-project bash -c "codex"

Gemini — prints an OAuth URL; open it in your browser, paste the code back

docker sandbox exec -it claude-my-project bash -c "gemini"

Copilot

COPILOT_TOKEN=$(security find-generic-password -s "copilot-cli" -w 2>/dev/null)
docker sandbox exec claude-my-project bash -c \
  "echo 'export COPILOT_GITHUB_TOKEN=$COPILOT_TOKEN' >> ~/.bashrc"
docker sandbox exec -it claude-my-project bash -c "copilot"

Without Docker

Set your API key in your shell:

export ANTHROPIC_API_KEY=sk-ant-api03-xxxxx

Media tools (optional)

For image/video/audio generation and vision, set these via export or in a .env file in your project root:

OPENAI_API_KEY=sk-...        # Image gen, vision, audio
GOOGLE_API_KEY=AI...         # Gemini vision, video, image
XAI_API_KEY=xai-...          # Grok video gen

The plugin hooks detect keys from both environment variables and .env files.


Git Hooks

This repo includes a pre-commit config with:

  • basic hygiene checks (trailing-whitespace, end-of-file-fixer, YAML validation, merge-conflict detection)
  • detect-private-key
  • a local staged-diff scanner that blocks obvious API keys and tokens before commit
  • a matching pre-push scanner that checks outgoing commits, so local snapshot branches or accidental later commits do not slip through push

Install it with:

uv tool install pre-commit
pre-commit install
pre-commit install --hook-type pre-push

Run it manually with:

pre-commit run --all-files
pre-commit run block-pushed-secrets --hook-stage pre-push

The secret scanner only inspects added git diff lines and never prints secret values back to the terminal.


Skills

# Full quality refinement loop (main feature)
/massgen-refinery:refine "Build a landing page for an AI model router"

# With custom evaluation criteria
/massgen-refinery:refine "Build an API" --criteria "E1: REST conventions, E2: Error handling"

# Improve an existing deliverable
/massgen-refinery:refine "Rewrite the docs" --prior-answer ./docs/api.md

# Quick one-shot evaluation (no iteration)
/massgen-refinery:evaluate ./output.html

# Media skills
/massgen-refinery:read-media screenshot.png        # Vision analysis
/massgen-refinery:image-generation "A sunset..."    # Image generation
/massgen-refinery:video-generation "A timelapse..." # Video generation
/massgen-refinery:audio-generation "Narrate..."     # Audio/TTS

How /refine Works

┌─────────────────────────────────────────────────────┐
│                    /refine                           │
│                                                     │
│  1. Setup                                           │
│     ├─ git init (if needed)                         │
│     ├─ init_session → timestamped session dir       │
│     └─ generate_eval_criteria → 3-7 criteria        │
│                                                     │
│  2. Answer Production (inner loop)                  │
│     ├─ Build/improve deliverable                    │
│     ├─ Spawn builder(s) in parallel for large work  │
│     ├─ Output-first verification (read_media)       │
│     └─ Build → verify → fix → re-verify → repeat   │
│                                                     │
│  3. Pre-Submission Gate                             │
│     └─ regression-guard → pass/fail/mixed verdict   │
│                                                     │
│  4. Submit                                          │
│     └─ new_answer → snapshot deliverables           │
│                                                     │
│  5. Evaluation (parallel)                           │
│     ├─ round-evaluator → verdict + critique + tasks │
│     └─ trace-analyzer → process report + scores     │
│                                                     │
│  6. Synthesize                                      │
│     ├─ Read evaluation results                      │
│     ├─ Save process learnings to memory             │
│     └─ converged → done, iterate → continue         │
│                                                     │
│  7. Context Boundary (if iterating)                 │
│     └─ Plan mode → next-round handoff               │
│     └─ Next iteration starts fresh                  │
└─────────────────────────────────────────────────────┘

Agents

Agent Purpose When Used
round-evaluator Brutally honest critique + structured improvement spec After each submission
execution-trace-analyzer Process optimization (errors, effort, tools) Parallel with evaluator
builder Large artifact generation from prescriptive spec During answer production
regression-guard Verify improvements don't lose prior strengths Before submission
media-worker Background media generation/analysis When media ops would block

MCP Tools

Provided by the massgen package:

  • quality-tools: init_session, generate_eval_criteria, submit_checklist, propose_improvements, reset_evaluation
  • workflow-tools: new_answer, vote
  • media-tools: generate_media, read_media

Key Principles

  • Output-first verification: Run, interact, verify — don't just write
  • Default to iterate: First drafts are 6s, not 8s
  • Rebuild, don't patch: Use existing work as reference, not starting point
  • Breakthrough amplification: Spread techniques that work to weaker components
  • No feature accumulation: Features on mediocre foundation = mediocre result

Context Rotation

Between refinement rounds, the plugin automatically clears context to prevent approach anchoring and token bloat. When the evaluator's verdict is "iterate," Claude finishes a handoff plan and shows the plan approval dialog. When Claude is launched through the recommended sandbox tmux helper, a lightweight watcher polls the Claude pane and presses Enter when that dialog appears with the default "Yes, clear context" option selected.

Requires tmux inside the sandbox and the tmux helper launch command for automatic clearing. Without that watcher, the plan approval UI appears and you select option 1 manually.

This technique is adapted from the VNX Orchestration System's context rotation pattern, which uses tmux keystroke injection to manage context boundaries in multi-agent workflows.


Multi-Agent Evaluation (Experimental)

Status: Coming soon. The single-agent /refine loop above is the main supported workflow. Multi-agent evaluation via MassGen is experimental.

The /refine skill supports opt-in multi-model evaluation where multiple LLM backends (GPT, Gemini, Claude, Grok) evaluate your deliverable in parallel and reach consensus:

/massgen-refinery:refine "Build a landing page" \
  --eval-backends openai/gpt-5.4 gemini/gemini-3-flash-preview

Or run a standalone multi-agent session:

/massgen-refinery:massgen-run "Explain quantum computing" \
  --models openai/gpt-5.4 gemini/gemini-3-flash-preview

/team-massgen-run: Multi-model orchestration via MassGen step mode. The lead session launches each backend as a background process, tracks answers/votes via a shared session directory, and synthesizes the consensus winner.

Agent Teams integration (experimental) is in progress — currently blocked by a known limitation where teammate background tasks get killed on idle (#17764, #28875, #29271). The current approach uses lead-managed background tasks instead.

Requires API keys for each backend. Falls back to the normal single-model round-evaluator if massgen or API keys are unavailable.

See skills/massgen-run/ for details.


Development Setup (massgen from local source)

For experimenting with a local massgen checkout instead of the published package.

1. Update plugin.json for local source

Edit .claude-plugin/plugin.json to point MCP servers at your local massgen using uv run --directory instead of uvx:

{
  "quality-tools": {
    "type": "stdio",
    "command": "uv",
    "args": ["run", "--directory", "/path/to/massgen", "python", "-m", "massgen.mcp_tools.standalone.quality_server"]
  },
  "workflow-tools": {
    "type": "stdio",
    "command": "uv",
    "args": ["run", "--directory", "/path/to/massgen", "python", "-m", "massgen.mcp_tools.standalone.workflow_server"]
  },
  "media-tools": {
    "type": "stdio",
    "command": "uv",
    "args": ["run", "--directory", "/path/to/massgen", "python", "-m", "massgen.mcp_tools.standalone.media_server"]
  }
}

2. Create the sandbox (with local source mounted)

Mount your project, massgen source, and the plugin — all read-write since uv sync needs to create a venv in the massgen directory:

docker sandbox run claude \
  ~/my-project \
  /path/to/massgen \
  /path/to/massgen-refinery

3. Install uv and sync massgen (one time — persists)

docker sandbox exec -it claude-my-project bash

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc

# Sync massgen dependencies
cd /path/to/massgen && uv sync

exit

4. Launch with the plugin

# With tmux helper (recommended — enables auto-clear between rounds)
docker sandbox exec -it claude-my-project bash -lc \
  "bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh"

# Without tmux (no auto-clear)
docker sandbox run claude-my-project -- \
  --plugin-dir /path/to/massgen-refinery

Local dev without Docker

# With tmux helper (recommended)
bash /path/to/massgen-refinery/scripts/launch-claude-tmux.sh

# Without tmux
claude --plugin-dir /path/to/massgen-refinery

Reload after plugin changes without restarting:

/reload-plugins

License

MIT

About

Claude Code plugin that brings MassGen's iterative quality enforcement to any project. Evaluate, improve, and verify work through structured refinement loops until it genuinely meets a high bar — not just "good enough."

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages