Skip to content

VOIDXAI/deeploop

Repository files navigation

Deeploop

Tests License: MIT

Autonomous optimization loop for Claude Code, inspired by Karpathy's autoresearch.

Iteratively modify code, measure results, and keep what works — powered by a Stop Hook that keeps Claude running until the goal is met or budget exhausted.

How it works

/deeploop make the search function faster
  1. Setup: Claude explores the code, picks an eval, creates an isolated git worktree
  2. Loop: Hypothesize → Change → Measure → Keep/Discard from an anchor commit (driven by Stop Hook)
  3. Finish: Summary + merge prompt

You interact 3 times: start, confirm plan, decide merge. Everything else is autonomous.

Architecture

~/.claude/skills/deeploop/
├── SKILL.md              # Main protocol injected into Claude's prompt
├── references/
│   ├── protocol.md       # state.json / results.tsv / anchor semantics
│   └── evals.md          # eval modes, parse rules, guardrails
└── hooks/
    └── stop-hook.sh      # Stop Hook — keeps Claude iterating

Per-project (runtime):
.deeploop/<tag>/
├── worktree/             # Git worktree (isolated code checkout)
├── state.json            # Session state (iteration, best value, session_id)
├── results.tsv           # Structured experiment ledger for hook/resume logic
└── journal.md            # Experiment journal — hypotheses, observations, learnings

Key design decisions

Decision Choice Why
Isolation Git worktree Branch + code + state unified under .deeploop/<tag>/
Loop driver Stop Hook Truly continuous — not limited by turn length
Multi-session session_id matching Each Claude terminal matches its own worktree
Code/metadata separation worktree/ subdir state.json outside git — zero pollution on merge
Experiment rollback Anchor commit restore Discard resets the whole allowed scope to the last known-good state
Recovery data results.tsv Machine-readable iteration history for hook/resume logic
Experiment log journal.md Captures reasoning, surprises, and learnings per iteration

Requirements

  • Python 3.9+
  • bash
  • jq

No pip dependencies — uses only the Python standard library.

Install

# Clone into Claude Code's skills directory
git clone https://github.com/VOIDXAI/deeploop.git ~/.claude/skills/deeploop

Restart Claude Code to activate. The /deeploop command will appear in the skill menu.

Uninstall

rm -rf ~/.claude/skills/deeploop

Usage

# Free-form goal
/deeploop make the search function faster
/deeploop improve test coverage for auth module to 80%
/deeploop iterate on the system prompt until accuracy > 90%

# Session management
ls .deeploop/*/state.json          # list sessions
git worktree list                  # list worktrees

Evaluation modes

Mode When How Claude measures
Command Benchmark script exists Run command, parse metric
Test suite Tests exist Run tests, track pass/fail
LLM-judge Subjective quality Claude scores against criteria
Diff-check Structural goal Claude analyzes the diff
User-checkpoint Human judgment needed Pause and ask

Modes can be combined, but one metric should be declared primary and the rest treated as guardrails.

Multiple sessions

Each session runs in its own worktree with a unique session_id. Multiple Claude Code terminals can run different optimization loops on the same repo simultaneously.

# Terminal 1
/deeploop optimize search performance     # → .deeploop/search-perf/

# Terminal 2
/deeploop improve prompt accuracy          # → .deeploop/prompt-accuracy/

Testing

Three test layers, all in Python:

# Layer 1+2: Interaction contract + golden transcripts (fast, no LLM)
python3 -m unittest discover -s tests -p "test_*.py"

# Layer 3: Live E2E — real Claude CLI calls (slow, opt-in)
DEEPLOOP_LIVE_E2E=1 python3 -m unittest tests.test_live_e2e -v

Live E2E requires the claude CLI and is skipped by default. Set DEEPLOOP_LIVE_E2E_MODEL to override the model, DEEPLOOP_LIVE_E2E_TIMEOUT to adjust the timeout (default: 600s).

Benchmarks

python3 benchmarks/capability_benchmark.py

This benchmark is optional and is not part of the default test suite.

Credits

License

MIT

About

Autonomous optimization loop for Claude Code that iteratively modifies code, measures results, and keeps what works.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors