Autonomous optimization loop for Claude Code, inspired by Karpathy's autoresearch.
Iteratively modify code, measure results, and keep what works — powered by a Stop Hook that keeps Claude running until the goal is met or budget exhausted.
/deeploop make the search function faster
- Setup: Claude explores the code, picks an eval, creates an isolated git worktree
- Loop: Hypothesize → Change → Measure → Keep/Discard from an anchor commit (driven by Stop Hook)
- Finish: Summary + merge prompt
You interact 3 times: start, confirm plan, decide merge. Everything else is autonomous.
~/.claude/skills/deeploop/
├── SKILL.md # Main protocol injected into Claude's prompt
├── references/
│ ├── protocol.md # state.json / results.tsv / anchor semantics
│ └── evals.md # eval modes, parse rules, guardrails
└── hooks/
└── stop-hook.sh # Stop Hook — keeps Claude iterating
Per-project (runtime):
.deeploop/<tag>/
├── worktree/ # Git worktree (isolated code checkout)
├── state.json # Session state (iteration, best value, session_id)
├── results.tsv # Structured experiment ledger for hook/resume logic
└── journal.md # Experiment journal — hypotheses, observations, learnings
| Decision | Choice | Why |
|---|---|---|
| Isolation | Git worktree | Branch + code + state unified under .deeploop/<tag>/ |
| Loop driver | Stop Hook | Truly continuous — not limited by turn length |
| Multi-session | session_id matching | Each Claude terminal matches its own worktree |
| Code/metadata separation | worktree/ subdir | state.json outside git — zero pollution on merge |
| Experiment rollback | Anchor commit restore | Discard resets the whole allowed scope to the last known-good state |
| Recovery data | results.tsv | Machine-readable iteration history for hook/resume logic |
| Experiment log | journal.md | Captures reasoning, surprises, and learnings per iteration |
- Python 3.9+
- bash
- jq
No pip dependencies — uses only the Python standard library.
# Clone into Claude Code's skills directory
git clone https://github.com/VOIDXAI/deeploop.git ~/.claude/skills/deeploopRestart Claude Code to activate. The /deeploop command will appear in the skill menu.
rm -rf ~/.claude/skills/deeploop# Free-form goal
/deeploop make the search function faster
/deeploop improve test coverage for auth module to 80%
/deeploop iterate on the system prompt until accuracy > 90%
# Session management
ls .deeploop/*/state.json # list sessions
git worktree list # list worktrees| Mode | When | How Claude measures |
|---|---|---|
| Command | Benchmark script exists | Run command, parse metric |
| Test suite | Tests exist | Run tests, track pass/fail |
| LLM-judge | Subjective quality | Claude scores against criteria |
| Diff-check | Structural goal | Claude analyzes the diff |
| User-checkpoint | Human judgment needed | Pause and ask |
Modes can be combined, but one metric should be declared primary and the rest treated as guardrails.
Each session runs in its own worktree with a unique session_id. Multiple Claude Code terminals can run different optimization loops on the same repo simultaneously.
# Terminal 1
/deeploop optimize search performance # → .deeploop/search-perf/
# Terminal 2
/deeploop improve prompt accuracy # → .deeploop/prompt-accuracy/Three test layers, all in Python:
# Layer 1+2: Interaction contract + golden transcripts (fast, no LLM)
python3 -m unittest discover -s tests -p "test_*.py"
# Layer 3: Live E2E — real Claude CLI calls (slow, opt-in)
DEEPLOOP_LIVE_E2E=1 python3 -m unittest tests.test_live_e2e -vLive E2E requires the claude CLI and is skipped by default.
Set DEEPLOOP_LIVE_E2E_MODEL to override the model,
DEEPLOOP_LIVE_E2E_TIMEOUT to adjust the timeout (default: 600s).
python3 benchmarks/capability_benchmark.pyThis benchmark is optional and is not part of the default test suite.
- Karpathy's autoresearch — the core idea
- Ralph loop — Stop Hook pattern inspiration
MIT