Step-by-step instructions for getting CuriosityEngine running. Aimed at someone who hasn't run a Docker-based research tool before and just wants the engine working on their laptop.
If you've installed similar projects before, the README's Setup section is a faster ten-line summary.
-
A computer with Docker Desktop (recommended path) or Python 3.11+ (alternative path).
- Docker Desktop: free for personal use. Download for macOS / Windows / Linux. Open it once after install — Docker won't run until the desktop app is open.
- Python 3.11+: only needed if you don't want to use Docker. Comes with macOS recent versions or via python.org / your package manager.
-
An API key from at least one LLM provider. Cheapest options that work:
- Anthropic (Claude) — supports server-side
web_searchandcode_executionnatively. ~$5-10/month for moderate use. - OpenAI (GPT) — works as a verifier for cross-family adversarial review. ~$5-10/month for moderate use.
- Recommendation: get both. Use one as primary, the other as verifier. Cross-family verification is one of CE's main design choices and only works with two providers.
- Other supported: Google Gemini, OpenRouter, Ollama (local), xAI, Groq, Together, DeepSeek, Moonshot, LM Studio. Full list and base URLs in the README's OpenAI-compat shortcuts.
- Anthropic (Claude) — supports server-side
-
A terminal. Mac: open Terminal.app. Windows: PowerShell. Linux: whatever you use.
-
Git for cloning the repo.
git --versionshould print something. If not, install it.
Docker fully isolates the code_execution tool from your host machine. This matters: the engine can run Python it generates, and you don't want that running directly on your laptop without isolation.
cd ~/ # or wherever you keep code
git clone <this-repo-url> CuriosityEngine
cd CuriosityEngine./curiosity --show-journalThe first run does three things:
- Builds the Docker image (~5-10 minutes — pulls Python 3.13, scientific stack, dependencies).
- Runs the interactive setup wizard inside the container — asks for your primary model + verifier model + API keys.
- Saves config to
~/.CuriosityEngine/engine.tomlon your host machine (persists across rebuilds).
The wizard prompts:
- Primary provider (anthropic | openai_compat). Pick anthropic if you have a Claude key.
- Primary model name (e.g.
claude-sonnet-4-6). - API key — paste it. Stored locally only.
- Verifier provider/model — different family from primary. If primary is Claude, pick OpenAI here (e.g.
gpt-5.1). - Advanced settings — accept defaults; you can tune later via the web UI.
If something goes wrong: delete ~/.CuriosityEngine/engine.toml and run the command again to re-trigger the wizard.
./curiosity --cycles 1 --domain "your topic of interest"Replace "your topic of interest" with whatever you actually want to research. Examples: "AI alignment", "battery chemistry", "consumer credit risk modelling". The engine introspects, generates questions, runs investigations with web search, and saves state to ./data/research_journal.json.
A single cycle takes ~5-15 minutes depending on your model speed and how deep the investigation goes.
./curiosity webBrowse to http://localhost:8000 in your web browser. The web UI gives you:
- A list of journals (each
data/*.jsonis a journal). - Per-journal tabs: Overview, Entries, Insights, Register, Predictions, Focus & Queue, Graph, Runs, Coverage, Admin.
- Settings page for editing every config knob.
- A Run page to start cycles via the browser instead of the terminal.
To stop the web server:
./curiosity web stopTo see what the web server is doing:
./curiosity web logsTo restart it (e.g. after editing the engine code):
./curiosity web restartIf you git pull and CE's requirements.txt changes:
./curiosity --rebuild --show-journalThis rebuilds the Docker image. Takes ~5-10 minutes. Your config and journal data are unaffected (they live outside the container).
Less isolation. The code_execution tool will run subprocess Python directly on your machine. Use E2B sandbox for isolation if you take this path. Only recommended if you can't use Docker.
git clone <this-repo-url> CuriosityEngine
cd CuriosityEngine
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
pip install numpy scipy pandas scikit-learn matplotlib # optional, for code_execution
python curiosity_engine.py --show-journal # triggers setup wizardThe wizard works the same as the Docker path. From here, all CLI commands are python curiosity_engine.py ... instead of ./curiosity .... Web UI:
uvicorn web.main:app --host 127.0.0.1 --port 8000Once ./curiosity web is running and you've opened http://localhost:8000:
Lists every journal under data/*.json. Click a journal to enter its dashboard. To create a new journal, just run ./curiosity --cycles 1 --domain "..." --journal data/<your-name>.json. The first cycle creates the file.
- Overview — counts, recent entries, surprise summary. Start here to see what the journal looks like.
- Entries — every investigation cycle's record. Filter by domain tag.
- Insights — synthesized cross-references that haven't been verified yet.
- Register — the durable artifact: validated insights with full audit trail. Per-component novelty badges (rose = restatement, amber = extension, emerald = new_synthesis), Pareto-axis values, predictions attached, "Export directive →" button per qualifying entry.
- Predictions — falsifiable claims with status tracking. Run
./curiosity --check-predictionsto revisit due ones. - Focus & Queue — set the journal's research focus, add your own questions, drag-and-drop priority within source buckets.
- Graph — interactive force-directed knowledge graph. Click any node for detail.
- Runs — per-journal run history. Live SSE log streaming on active runs. Stop button on running cycles.
- Coverage — negative-space gap matrix. Method × problem cells; click any cell for detail. Diff vs prior scan shows what filled / what's still open / what newly emerged.
- Admin — maintenance operations. The cards you'll use most:
- Re-verify register entries (audit) — re-runs the verifier over existing register under updated rules. Append-only; never overwrites.
- Scan for unexplored gaps — builds the (method × problem) matrix.
- Export research directives bundle — runs the directive pipeline across every qualifying entry.
- Pareto admission frontier (read-only) — shows which active entries set the admission bar.
- Known prior art anchors — add human-curated peer systems the verifier MUST evaluate.
Edits ~/.CuriosityEngine/engine.toml in place. Every engine knob is exposed:
- Model profiles (primary, verifier, optional extras)
- Per-phase model routing (cross-ref, directive_primary, directive_primary_fast, directive_verifier, gap-scan extract/classify, investigation_assessor)
- Loop knobs (cycles per cross-ref, novelty threshold, register confidence floor)
- Self-evolving verifier knobs:
register_admission_mode(scalar | pareto)synthesis_candidate_count(Phase 6)introspection_persona_count(Phase 7)idea_evolution_enabled+ floor + max_depth (Phase 8)hypothesis_variant_count(Phase 9)
- Held pipeline + gap-scan thresholds + parallelism
Saving doesn't restart anything — the next cycle picks up the new config.
Form to start a cycle. Pick the journal, set the domain (or leave it from the journal's last domain), set cycle count. Live SSE log streams while the cycle runs.
"Docker is not running" → Open Docker Desktop. Wait for the whale icon to stop animating.
Web UI 502 / 504 → Container probably crashed. docker compose logs web to see why. Common causes: stale config (delete ~/.CuriosityEngine/engine.toml and re-run ./curiosity --show-journal), API key invalid, model name wrong.
"unterminated triple-quoted string literal" or other syntax errors after a git pull → Docker bind-mount staleness on macOS. Restart the web container: ./curiosity web restart.
"rate limited" on academic_search → expected on long runs. Engine has staged cooldowns (2/4/8/16/30s, cycling back). Just wait. If persistent: set SEMANTIC_SCHOLAR_API_KEY (free at semanticscholar.org) for a private bucket, and OPENALEX_MAILTO for OpenAlex's polite pool.
Cycle takes forever → some reasoning models (Kimi K2.x, GPT-5 thinking, o-series) hang on big prompts in non-streaming mode. Increase timeout_seconds in the model profile, or switch to a smaller / non-reasoning model for the directive_primary_fast role.
API key not found → engine.toml stores keys, OR you can set ANTHROPIC_API_KEY / OPENAI_API_KEY in your shell environment. Both work.
Want to migrate an old research_journal.json →
mkdir -p ./data
cp /old/path/research_journal.json ./data/
cp /old/path/register.md ./data/ # if you have it- Set a focus —
./curiosity --set-focus "TOPIC — specific lens"so every prompt knows what you actually care about. - Run cycles — start with
--cycles 3and review the register afterwards. - Review the register —
./curiosity --review-registerlets you approve / reject-with-reason / defer each new entry. Rejection reasons feed into future verifier prompts. - Add known prior art anchors — Admin tab → Known prior art. If you spot a peer system the verifier missed, add it; the verifier will be required to evaluate it on every future claim in that domain.
- Export directives — once you have validated register entries, the "Export directive →" button generates a publication-shaped research plan from each entry.
For deeper context on what each piece does and why, read the README's Data model section and Self-evolving verifier section.