Deeploop

Autonomous optimization loop for Claude Code, inspired by Karpathy's autoresearch.

Iteratively modify code, measure results, and keep what works — powered by a Stop Hook that keeps Claude running until the goal is met or budget exhausted.

How it works

/deeploop make the search function faster

Setup: Claude explores the code, picks an eval, creates an isolated git worktree
Loop: Hypothesize → Change → Measure → Keep/Discard from an anchor commit (driven by Stop Hook)
Finish: Summary + merge prompt

You interact 3 times: start, confirm plan, decide merge. Everything else is autonomous.

Architecture

~/.claude/skills/deeploop/
├── SKILL.md              # Main protocol injected into Claude's prompt
├── references/
│   ├── protocol.md       # state.json / results.tsv / anchor semantics
│   └── evals.md          # eval modes, parse rules, guardrails
└── hooks/
    └── stop-hook.sh      # Stop Hook — keeps Claude iterating

Per-project (runtime):
.deeploop/<tag>/
├── worktree/             # Git worktree (isolated code checkout)
├── state.json            # Session state (iteration, best value, session_id)
├── results.tsv           # Structured experiment ledger for hook/resume logic
└── journal.md            # Experiment journal — hypotheses, observations, learnings

Key design decisions

Decision	Choice	Why
Isolation	Git worktree	Branch + code + state unified under `.deeploop/<tag>/`
Loop driver	Stop Hook	Truly continuous — not limited by turn length
Multi-session	session_id matching	Each Claude terminal matches its own worktree
Code/metadata separation	worktree/ subdir	state.json outside git — zero pollution on merge
Experiment rollback	Anchor commit restore	Discard resets the whole allowed scope to the last known-good state
Recovery data	results.tsv	Machine-readable iteration history for hook/resume logic
Experiment log	journal.md	Captures reasoning, surprises, and learnings per iteration

Requirements

Python 3.9+
bash
jq

No pip dependencies — uses only the Python standard library.

Install

# Clone into Claude Code's skills directory
git clone https://github.com/VOIDXAI/deeploop.git ~/.claude/skills/deeploop

Restart Claude Code to activate. The /deeploop command will appear in the skill menu.

Uninstall

rm -rf ~/.claude/skills/deeploop

Usage

# Free-form goal
/deeploop make the search function faster
/deeploop improve test coverage for auth module to 80%
/deeploop iterate on the system prompt until accuracy > 90%

# Session management
ls .deeploop/*/state.json          # list sessions
git worktree list                  # list worktrees

Evaluation modes

Mode	When	How Claude measures
Command	Benchmark script exists	Run command, parse metric
Test suite	Tests exist	Run tests, track pass/fail
LLM-judge	Subjective quality	Claude scores against criteria
Diff-check	Structural goal	Claude analyzes the diff
User-checkpoint	Human judgment needed	Pause and ask

Modes can be combined, but one metric should be declared primary and the rest treated as guardrails.

Multiple sessions

Each session runs in its own worktree with a unique session_id. Multiple Claude Code terminals can run different optimization loops on the same repo simultaneously.

# Terminal 1
/deeploop optimize search performance     # → .deeploop/search-perf/

# Terminal 2
/deeploop improve prompt accuracy          # → .deeploop/prompt-accuracy/

Testing

Three test layers, all in Python:

# Layer 1+2: Interaction contract + golden transcripts (fast, no LLM)
python3 -m unittest discover -s tests -p "test_*.py"

# Layer 3: Live E2E — real Claude CLI calls (slow, opt-in)
DEEPLOOP_LIVE_E2E=1 python3 -m unittest tests.test_live_e2e -v

Live E2E requires the claude CLI and is skipped by default. Set DEEPLOOP_LIVE_E2E_MODEL to override the model, DEEPLOOP_LIVE_E2E_TIMEOUT to adjust the timeout (default: 600s).

Benchmarks

python3 benchmarks/capability_benchmark.py

This benchmark is optional and is not part of the default test suite.

Credits

Karpathy's autoresearch — the core idea
Ralph loop — Stop Hook pattern inspiration

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
hooks		hooks
references		references
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deeploop

How it works

Architecture

Key design decisions

Requirements

Install

Uninstall

Usage

Evaluation modes

Multiple sessions

Testing

Benchmarks

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deeploop

How it works

Architecture

Key design decisions

Requirements

Install

Uninstall

Usage

Evaluation modes

Multiple sessions

Testing

Benchmarks

Credits

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages