Skip to content

feat(adapters): E2BAgentAdapter — cloud execution in isolated Linux VM (#534)#540

Merged
frankbria merged 3 commits intomainfrom
feat/e2b-agent-adapter-534
Apr 4, 2026
Merged

feat(adapters): E2BAgentAdapter — cloud execution in isolated Linux VM (#534)#540
frankbria merged 3 commits intomainfrom
feat/e2b-agent-adapter-534

Conversation

@frankbria
Copy link
Copy Markdown
Owner

@frankbria frankbria commented Apr 4, 2026

Summary

Implements --engine cloud routing through an E2B Linux sandbox for fully isolated task execution without touching the local filesystem. Closes #534.

  • codeframe/adapters/e2b/adapter.pyE2BAgentAdapter implementing AgentAdapter protocol: credential scan → sandbox create → workspace upload → codeframe install → agent run → git-diff download → AgentResult with cloud_metadata
  • codeframe/adapters/e2b/credential_scanner.pyscan_path() blocks .env files, private keys, AWS/OpenAI/GitHub secret patterns before upload
  • codeframe/adapters/e2b/budget.pyrecord_cloud_run/get_cloud_run for cloud_run_metadata SQLite table tracking sandbox minutes and cost
  • codeframe/core/engine_registry.py — adds "cloud" to VALID_ENGINES and EXTERNAL_ENGINES; wires factory functions
  • codeframe/core/adapters/agent_adapter.py — adds cloud_metadata: dict | None = None to AgentResult (non-breaking)
  • codeframe/core/runtime.py — adds cloud_timeout_minutes param forwarded to get_external_adapter
  • codeframe/core/workspace.py — adds cloud_run_metadata table to schema (init + migration)
  • codeframe/core/conductor.py — forwards cloud_timeout_minutes through start_batch and _execute_task_subprocess
  • codeframe/cli/validators.py — adds require_e2b_api_key() following existing validator pattern
  • codeframe/cli/app.py--cloud-timeout option on work start and batch run; E2B key validation; new cf work show command with Cloud Execution Cost table
  • tests/adapters/test_e2b_adapter.py — 35 tests: credential scanner (11), budget persistence (3), adapter lifecycle with mocked e2b.Sandbox (7), engine registry (8), AgentResult field (2), validators (2), workspace schema (2)

Test plan

  • uv run pytest tests/adapters/test_e2b_adapter.py — 35/35 pass
  • uv run pytest tests/core/test_agent_adapter.py tests/core/test_engine_registry.py — 44/44 pass
  • uv run pytest tests/core/test_workspace.py — 17/17 pass
  • uv run pytest tests/adapters/test_llm.py — all pass (no regressions)
  • cd web-ui && npm test — 680/680 pass
  • uv run ruff check on all changed files — clean
  • Manual: cf work start <id> --execute --engine cloud --cloud-timeout 30 (requires E2B_API_KEY)
  • Manual: cf work show <id> displays Cloud Execution Cost table after cloud run

Summary by CodeRabbit

  • New Features

    • Added a "cloud" execution engine with configurable sandbox timeout (--engine cloud --cloud-timeout)
    • Automatic workspace credential scanning that blocks runs if secrets detected
    • work show now displays cloud run metrics (estimated sandbox minutes, cost, uploads/downloads)
    • Agent results may include cloud-specific metadata (cost/minutes/files)
  • Chores

    • Added optional "cloud" dependency group for the cloud integration
  • Tests

    • New tests for cloud adapter, scanner, and budget persistence

#534)

Implements --engine cloud routing through an E2B sandbox for fully isolated
task execution without touching the local filesystem.

- codeframe/adapters/e2b/adapter.py: E2BAgentAdapter implementing AgentAdapter
  protocol — credential scan, sandbox create, workspace upload, pip install,
  agent run, git-diff-based file download, AgentResult with cloud_metadata
- codeframe/adapters/e2b/credential_scanner.py: scan_path() blocks .env files,
  private keys, AWS/OpenAI/GitHub secret patterns before upload
- codeframe/adapters/e2b/budget.py: record_cloud_run/get_cloud_run for SQLite
  cloud_run_metadata table (sandbox_minutes, cost_usd_estimate, file counts)
- codeframe/core/engine_registry.py: adds "cloud" to VALID_ENGINES and
  EXTERNAL_ENGINES; wires get_external_adapter and _get_adapter_class
- codeframe/core/adapters/agent_adapter.py: adds cloud_metadata: dict|None
  field to AgentResult (None by default, no existing code broken)
- codeframe/core/runtime.py: adds cloud_timeout_minutes param to execute_agent
  and passes to get_external_adapter for cloud engine
- codeframe/core/workspace.py: adds cloud_run_metadata table to schema (init
  and migration paths)
- codeframe/core/conductor.py: forwards cloud_timeout_minutes through
  start_batch and _execute_task_subprocess (--cloud-timeout CLI flag)
- codeframe/cli/validators.py: adds require_e2b_api_key() following same
  pattern as require_anthropic/openai_api_key
- codeframe/cli/app.py: adds --cloud-timeout option to work start and batch
  run; adds E2B_API_KEY validation branch; adds `cf work show` command
  displaying run details and Cloud Execution Cost table for cloud engine
- tests/adapters/test_e2b_adapter.py: 35 tests covering credential scanner,
  budget persistence, adapter lifecycle (mock e2b.Sandbox), engine registry
  integration, AgentResult field, validators, and workspace schema
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c4199f4e-e130-4ed8-8535-c2426d838e40

📥 Commits

Reviewing files that changed from the base of the PR and between a9b877f and 7abc8fe.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml

Walkthrough

Adds a new "cloud" engine backed by an E2B adapter: scans workspaces for credentials, uploads files to an E2B sandbox, installs and runs CodeFrame there, streams execution events, downloads changed files, persists per-run budget metadata, wires CLI flags/validation, and adds tests and DB schema for cloud runs.

Changes

Cohort / File(s) Summary
E2B Adapter Core
codeframe/adapters/e2b/__init__.py, codeframe/adapters/e2b/adapter.py
Adds E2BAgentAdapter (lazy-exported). Creates E2B sandbox, clamps timeout (1–60m), scans workspace for credentials, uploads files, installs codeframe, runs agent command while streaming events, downloads modified/created files (via git porcelain), returns AgentResult with cloud_metadata, and ensures sandbox cleanup.
E2B Support Modules
codeframe/adapters/e2b/budget.py, codeframe/adapters/e2b/credential_scanner.py
Adds persistent budget tracking (record_cloud_run, get_cloud_run) using workspace SQLite and a credential scanner (scan_path) that detects blocked files by filename and content, returning ScanResult.
Core Framework Integration
codeframe/core/adapters/agent_adapter.py, codeframe/core/engine_registry.py, codeframe/core/runtime.py, codeframe/core/conductor.py
Adds AgentResult.cloud_metadata field. Registers "cloud" engine and resolves/instantiates E2BAgentAdapter with timeout_minutes. Threads cloud_timeout_minutes through execute_agent, start_batch, and subprocess executor.
CLI Extensions
codeframe/cli/app.py, codeframe/cli/validators.py
Adds --cloud-timeout option and documents --engine cloud. Adds work show display of cloud run cost/metrics. Adds require_e2b_api_key() validator (loads env/.env and exits if missing).
Workspace DB & Project Metadata
codeframe/core/workspace.py, pyproject.toml
Adds cloud_run_metadata table to DB init/upgrade and adds optional cloud dependency group (e2b>=2.0.0) in pyproject.
Tests
tests/adapters/test_e2b_adapter.py
New comprehensive tests: credential scanning, budget persistence, E2B adapter execution flows with mocked e2b.Sandbox, engine registry, CLI validator, workspace schema presence, and AgentResult.cloud_metadata behavior.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI
    participant Scanner as CredentialScanner
    participant E2B as E2B_Sandbox
    participant WS as LocalWorkspace

    User->>CLI: cf work start --engine cloud
    CLI->>Scanner: scan_path(workspace_path)
    alt secrets detected
        Scanner-->>CLI: ScanResult(blocked)
        CLI-->>User: return failed AgentResult (credential_scan_blocked)
    else clean
        CLI->>E2B: create Sandbox(timeout_minutes)
        E2B-->>CLI: sandbox instance
        CLI->>E2B: upload workspace files
        CLI->>E2B: pip install codeframe
        CLI->>E2B: run "cf work start <task_id> --execute"
        E2B->>CLI: stream stdout/stderr (progress events)
        alt execution success
            CLI->>E2B: git status --porcelain
            E2B-->>CLI: list modified/created files
            CLI->>E2B: download changed files
            E2B-->>CLI: file contents
            CLI->>WS: write files back
            CLI-->>User: return success AgentResult + cloud_metadata
        else execution failure
            CLI-->>User: return failed AgentResult + error
        end
        CLI->>E2B: sbx.kill()
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

🐰 I hopped into clouds with a careful scan,

I kept secrets safe before they ran,
I pip-installed, executed, fetched what changed,
Counted minutes, logged the cost arranged,
A joyful hop for sandboxed plans.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing E2BAgentAdapter for cloud execution in isolated Linux VMs via the --engine cloud option.
Linked Issues check ✅ Passed All objectives from Issue #534 are addressed: E2BAgentAdapter implemented with credential scanning, timeout management, budget tracking, CLI integration, and comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are directly aligned with implementing cloud execution: adapter code, credential scanning, budget tracking, CLI/runtime integration, schema updates, and tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/e2b-agent-adapter-534

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Review: E2BAgentAdapter

Well-structured PR. The layered design (credential scan -> sandbox -> upload -> run -> download) is clear, and test coverage is solid at 35 tests. A few issues worth fixing before merge:

Critical: API key leaked in process list

adapter.py, agent command construction (~line 196):

agent_cmd = (
    f"cd {_SANDBOX_WORKSPACE} && "
    f"ANTHROPIC_API_KEY={os.environ.get('ANTHROPIC_API_KEY', '')} "
    f"cf work start {task_id} --execute"
)

The key is visible in /proc//cmdline on the sandbox host, in shell history, and in any logging of the command string. Use the E2B SDK's environment dict parameter (if supported), or write the key to a temp env file inside the sandbox that is cleaned up afterward. At minimum, avoid interpolating secrets into shell strings.

Bug: record_cloud_run() is never called — cf work show table always empty

budget.py provides record_cloud_run(), and cf work show reads from it via get_cloud_run(). But there is no call to record_cloud_run() anywhere in the runtime.py diff after the adapter completes. The AgentResult.cloud_metadata dict is returned by the adapter but never persisted to the DB. The Cloud Execution Cost table in cf work show will always be empty for real runs.

Fix: after the external adapter returns in runtime.py's execute path, check agent_result.cloud_metadata and call record_cloud_run() if present.

Bug: New files created by agent are missed

_download_changed_files() uses:

git diff --name-only HEAD

This only shows modifications to already-tracked files. New files the agent creates are untracked and silently skipped. Replace with git status --porcelain (or chain git ls-files --others --exclude-standard) to capture both modified and new files.

Design: e2b should be an optional dependency

e2b>=2.0.0 is added to core dependencies in pyproject.toml, pulling in protobuf, dockerfile-parse, wcmatch, etc. for all users regardless of whether they use cloud execution. It should go under [project.optional-dependencies] (e.g., cloud = ["e2b>=2.0.0"]) with a lazy import guard in the adapter. The lazy import in engine_registry.py is already structured correctly — the pyproject entry is the inconsistency.

Minor: Credential scanner false-positive risk

The content pattern (?i)(api_key|secret|password)\s*=\s*['"][^'"]{8,}['"] will block config files or test fixtures that set a local dev password (e.g., password = "localdev123"). Consider scoping content scanning to specific file types (.py, .yaml, .json, .toml) or documenting the false-positive risk.

What's solid

  • Credential scanner design is clean; filename + content dual check is the right approach
  • Timeout clamping and _EXCLUDED_DIRS consistent between credential_scanner.py and _upload_workspace
  • cloud_metadata: dict | None = None addition to AgentResult is non-breaking
  • require_e2b_api_key() follows the existing validator pattern faithfully
  • DB schema migration uses CREATE TABLE IF NOT EXISTS pattern consistent with workspace.py
  • 35 unit tests with mocked sandbox are comprehensive; clamped-timeout and requirements checks are a nice touch

The three bugs above (API key leakage, missing record_cloud_run() call, untracked file gap) need fixes before this is production-ready.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
codeframe/core/runtime.py (1)

740-751: ⚠️ Potential issue | 🟠 Major

Persist result.cloud_metadata before returning.

E2BAgentAdapter.run() only hands budget data back in-memory. In the direct cf work start --engine cloud path, nothing writes that payload to cloud_run_metadata, so the new cf work show command has no cost row to display.

💾 Minimal fix
             result = wrapper.run(
                 run.task_id, packaged.prompt, effective_repo_path,
                 on_event=on_adapter_event,
             )
+            if engine == "cloud" and result.cloud_metadata:
+                from codeframe.adapters.e2b.budget import record_cloud_run
+
+                record_cloud_run(
+                    workspace=workspace,
+                    run_id=run.id,
+                    sandbox_minutes=float(result.cloud_metadata.get("sandbox_minutes", 0.0)),
+                    cost_usd=float(result.cloud_metadata.get("cost_usd_estimate", 0.0)),
+                    files_uploaded=int(result.cloud_metadata.get("files_uploaded", 0)),
+                    files_downloaded=int(result.cloud_metadata.get("files_downloaded", 0)),
+                    scan_blocked=int(result.cloud_metadata.get("credential_scan_blocked", 0)),
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/runtime.py` around lines 740 - 751, The wrapper.run() result
may include in-memory budget info in result.cloud_metadata that never gets
persisted; after calling wrapper.run(...) check if result.cloud_metadata is
present and store it on the run object as run.cloud_run_metadata =
result.cloud_metadata, then persist that change to the workspace (use the
repository's existing run-save/update helper on workspace, e.g.,
workspace.update_run or workspace.save_run) before returning so cf work show can
read the cost row.
🧹 Nitpick comments (1)
codeframe/adapters/e2b/credential_scanner.py (1)

33-55: Consider expanding secret detection patterns for broader coverage.

The current patterns cover AWS, OpenAI, and GitHub, but miss other common credential types that could leak:

  • Anthropic API keys (sk-ant-*)
  • Azure/Google Cloud credentials
  • Generic bearer tokens
  • .envrc files (used by direnv)
♻️ Suggested additions
 _BLOCKED_FILENAME_PATTERNS = (
     re.compile(r"^\.env$"),
     re.compile(r"^\.env\."),
+    re.compile(r"^\.envrc$"),
     re.compile(r"\.pem$"),
     ...
 )

 _SECRET_CONTENT_PATTERNS = (
     re.compile(r"AKIA[0-9A-Z]{16}"),                       # AWS access key
     re.compile(r"sk-[a-zA-Z0-9]{48}"),                     # OpenAI API key
+    re.compile(r"sk-ant-[a-zA-Z0-9-]{80,}"),               # Anthropic API key
     re.compile(r"ghp_[a-zA-Z0-9]{36}"),                    # GitHub PAT
     re.compile(r"ghs_[a-zA-Z0-9]{36}"),                    # GitHub app token
+    re.compile(r"Bearer\s+[a-zA-Z0-9_-]{20,}"),            # Generic bearer token
     re.compile(r"(?i)(api_key|secret|password)\s*=\s*['\"][^'\"]{8,}['\"]"),
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/e2b/credential_scanner.py` around lines 33 - 55, Update
the credential_scanner patterns: extend _BLOCKED_FILENAME_PATTERNS to include a
re for ".envrc" (so direnv files are blocked) and extend
_SECRET_CONTENT_PATTERNS to add regexes for Anthropic keys (e.g., sk-ant-
prefix), Azure/Google Cloud credential indicators (e.g., "azure_.*",
"GOOGLE_APPLICATION_CREDENTIALS", service account JSON patterns with "type\":
\"service_account\""), and generic bearer tokens (e.g.,
"Bearer\s+[A-Za-z0-9\-\._~\+\/]+=*"). Locate and modify the tuples named
_BLOCKED_FILENAME_PATTERNS and _SECRET_CONTENT_PATTERNS in credential_scanner.py
to append these compiled patterns ensuring case-insensitivity where appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/adapters/e2b/adapter.py`:
- Around line 241-248: The _EXCLUDED set used by _upload_workspace() is missing
entries that scan_path() excludes, causing mismatched behavior; update the
_EXCLUDED frozenset (used in _upload_workspace()) to include the same
directories as scan_path() (add ".tox", "dist", "build", and ".eggs") so uploads
mirror scanning exclusions and keep the two functions in lockstep.
- Around line 276-310: The code currently runs sbx.commands.run("git diff
--name-only HEAD") and always attempts sbx.files.read on every path, which fails
to delete files that were removed upstream; change the git invocation to "git
diff --name-status HEAD" then parse each output line into (status, rel_path),
and in the for loop handle deletions: if status == "D" remove the local file
(local.unlink() if exists), append rel_path to modified_files and increment
downloaded (or count deletions separately), log the deletion and skip
sbx.files.read; for non-deleted statuses continue to read and write as now
(using sbx.files.read), keeping existing logger.warning on read errors and the
emit("progress", ...) behavior. Ensure you reference sbx.commands.run,
sbx.files.read, modified_files, downloaded, and logger when implementing.

In `@codeframe/adapters/e2b/credential_scanner.py`:
- Around line 123-134: The _contains_secret function currently treats OSError as
safe by returning False; change its behavior to conservatively treat unreadable
files as containing secrets by returning True on OSError (or otherwise flagging
the file), so callers will block or further inspect unreadable files; update the
except OSError block in _contains_secret (which reads path.read_bytes() and
references _MAX_CONTENT_SAMPLE and _SECRET_CONTENT_PATTERNS) to return True and
optionally record/log the error for auditing.

In `@codeframe/cli/app.py`:
- Around line 2677-2679: The code currently searches only the last 20 engine log
entries via engine_stats.get_run_log(workspace, limit=20) and then uses a
one-letter iterator l to find run_log, which silently drops data for older runs
and triggers Ruff E741; change the lookup to fetch or query logs by run_id
(e.g., call engine_stats.get_run_log with a run_id/workspace filter if the API
supports it, or page through results until a log with run.id is found) and
replace the ambiguous iterator name l with a descriptive name like log or entry
when computing run_log to fix the E741 lint issue.
- Around line 2641-2666: The CLI handler uses the user-provided task_id directly
with runtime.get_latest_run(), which fails for short/prefix IDs; after obtaining
workspace via get_workspace(path) check for a run match and if none, expand the
prefix by locating the full ID (e.g., call runtime.list_runs(workspace) or
workspace.list_runs() to find the first run.id that startswith the provided
task_id) and then call runtime.get_latest_run(workspace, full_task_id) using
that resolved full ID so short IDs like "abc123" work the same as other
commands.

In `@codeframe/core/conductor.py`:
- Line 559: The BatchRun cloud timeout is accepted by start_batch but never
used; add a cloud_timeout_minutes:int = 30 field to the BatchRun dataclass, set
that field when constructing the BatchRun instance inside start_batch, and
thread that value through to task execution by passing
batch_run.cloud_timeout_minutes into every call to _execute_task_subprocess (and
update signatures where necessary, e.g., in _execute_serial and any other
_execute_* helpers that invoke _execute_task_subprocess) so the CLI
--cloud-timeout value is honored during batch execution.

In `@tests/adapters/test_e2b_adapter.py`:
- Around line 9-13: The test module imports include unused names causing lint
failures: remove tempfile from the top-level imports and remove call from the
unittest.mock import list in tests/adapters/test_e2b_adapter.py so only actually
used imports remain (e.g., keep os, sqlite3, Path, MagicMock, patch). Update the
import statements in that file accordingly to eliminate the unused-tempfile and
unused-call warnings.

---

Outside diff comments:
In `@codeframe/core/runtime.py`:
- Around line 740-751: The wrapper.run() result may include in-memory budget
info in result.cloud_metadata that never gets persisted; after calling
wrapper.run(...) check if result.cloud_metadata is present and store it on the
run object as run.cloud_run_metadata = result.cloud_metadata, then persist that
change to the workspace (use the repository's existing run-save/update helper on
workspace, e.g., workspace.update_run or workspace.save_run) before returning so
cf work show can read the cost row.

---

Nitpick comments:
In `@codeframe/adapters/e2b/credential_scanner.py`:
- Around line 33-55: Update the credential_scanner patterns: extend
_BLOCKED_FILENAME_PATTERNS to include a re for ".envrc" (so direnv files are
blocked) and extend _SECRET_CONTENT_PATTERNS to add regexes for Anthropic keys
(e.g., sk-ant- prefix), Azure/Google Cloud credential indicators (e.g.,
"azure_.*", "GOOGLE_APPLICATION_CREDENTIALS", service account JSON patterns with
"type\": \"service_account\""), and generic bearer tokens (e.g.,
"Bearer\s+[A-Za-z0-9\-\._~\+\/]+=*"). Locate and modify the tuples named
_BLOCKED_FILENAME_PATTERNS and _SECRET_CONTENT_PATTERNS in credential_scanner.py
to append these compiled patterns ensuring case-insensitivity where appropriate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fffd8df3-b429-4a98-ba4f-5d6e9e02346e

📥 Commits

Reviewing files that changed from the base of the PR and between be8e0a4 and 5905b51.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • codeframe/adapters/e2b/__init__.py
  • codeframe/adapters/e2b/adapter.py
  • codeframe/adapters/e2b/budget.py
  • codeframe/adapters/e2b/credential_scanner.py
  • codeframe/cli/app.py
  • codeframe/cli/validators.py
  • codeframe/core/adapters/agent_adapter.py
  • codeframe/core/conductor.py
  • codeframe/core/engine_registry.py
  • codeframe/core/runtime.py
  • codeframe/core/workspace.py
  • pyproject.toml
  • tests/adapters/test_e2b_adapter.py

Comment on lines +241 to +248
_EXCLUDED = frozenset({
"__pycache__", ".git", ".mypy_cache", ".pytest_cache",
".ruff_cache", "node_modules", ".venv", "venv",
})

uploaded = 0
for path in sorted(workspace_path.rglob("*")):
if any(part in _EXCLUDED for part in path.parts):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep upload exclusions in lockstep with the credential scanner.

scan_path() skips .tox, dist, build, and .eggs, but _upload_workspace() still uploads them. That breaks the “scan before upload” guarantee and can also balloon transfer time and sandbox cost.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/e2b/adapter.py` around lines 241 - 248, The _EXCLUDED set
used by _upload_workspace() is missing entries that scan_path() excludes,
causing mismatched behavior; update the _EXCLUDED frozenset (used in
_upload_workspace()) to include the same directories as scan_path() (add ".tox",
"dist", "build", and ".eggs") so uploads mirror scanning exclusions and keep the
two functions in lockstep.

Comment on lines +276 to +310
diff_result = sbx.commands.run(
f"cd {_SANDBOX_WORKSPACE} && git diff --name-only HEAD",
timeout=30,
)

if diff_result.exit_code != 0 or not diff_result.stdout.strip():
return [], 0

changed = [
line.strip()
for line in diff_result.stdout.splitlines()
if line.strip()
]

downloaded = 0
modified_files: list[str] = []

for rel_path in changed:
remote = f"{_SANDBOX_WORKSPACE}/{rel_path}"
local = workspace_path / rel_path

try:
content = sbx.files.read(remote)
local.parent.mkdir(parents=True, exist_ok=True)
if isinstance(content, str):
local.write_text(content, encoding="utf-8")
else:
local.write_bytes(bytes(content))
modified_files.append(rel_path)
downloaded += 1
logger.debug("Downloaded: %s", rel_path)
except Exception as exc:
logger.warning("Failed to download %s: %s", rel_path, exc)

emit("progress", f"Downloaded {downloaded} changed file(s)")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mirror deletions back to the local workspace.

git diff --name-only HEAD includes deleted paths, but this loop always tries to read() them. A cloud run that deletes files will return success while leaving stale files behind locally.

🧹 Minimal fix
         diff_result = sbx.commands.run(
-            f"cd {_SANDBOX_WORKSPACE} && git diff --name-only HEAD",
+            f"cd {_SANDBOX_WORKSPACE} && git diff --name-status HEAD",
             timeout=30,
         )
@@
-        changed = [
-            line.strip()
-            for line in diff_result.stdout.splitlines()
-            if line.strip()
-        ]
-
         downloaded = 0
         modified_files: list[str] = []
 
-        for rel_path in changed:
+        for line in diff_result.stdout.splitlines():
+            if not line.strip():
+                continue
+            status, *paths = line.split("\t")
+            rel_path = paths[-1]
             remote = f"{_SANDBOX_WORKSPACE}/{rel_path}"
             local = workspace_path / rel_path
 
             try:
+                if status == "D":
+                    if local.exists():
+                        local.unlink()
+                    modified_files.append(rel_path)
+                    downloaded += 1
+                    continue
+
                 content = sbx.files.read(remote)
                 local.parent.mkdir(parents=True, exist_ok=True)
                 if isinstance(content, str):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/e2b/adapter.py` around lines 276 - 310, The code currently
runs sbx.commands.run("git diff --name-only HEAD") and always attempts
sbx.files.read on every path, which fails to delete files that were removed
upstream; change the git invocation to "git diff --name-status HEAD" then parse
each output line into (status, rel_path), and in the for loop handle deletions:
if status == "D" remove the local file (local.unlink() if exists), append
rel_path to modified_files and increment downloaded (or count deletions
separately), log the deletion and skip sbx.files.read; for non-deleted statuses
continue to read and write as now (using sbx.files.read), keeping existing
logger.warning on read errors and the emit("progress", ...) behavior. Ensure you
reference sbx.commands.run, sbx.files.read, modified_files, downloaded, and
logger when implementing.

Comment on lines +123 to +134
def _contains_secret(path: Path) -> bool:
"""Return True if the file content matches any known secret pattern."""
try:
content = path.read_bytes()[:_MAX_CONTENT_SAMPLE]
text = content.decode("utf-8", errors="replace")
except OSError:
return False

for pattern in _SECRET_CONTENT_PATTERNS:
if pattern.search(text):
return True
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

OSError should be treated as potentially containing secrets, not as safe.

When a file cannot be read (OSError), returning False treats it as "no secrets found." For a security-focused scanner, the safer default is to assume unreadable files might contain secrets and block them.

🛡️ Proposed fix
     try:
         content = path.read_bytes()[:_MAX_CONTENT_SAMPLE]
         text = content.decode("utf-8", errors="replace")
     except OSError:
-        return False
+        # Fail-safe: treat unreadable files as potentially containing secrets
+        return True
 
     for pattern in _SECRET_CONTENT_PATTERNS:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _contains_secret(path: Path) -> bool:
"""Return True if the file content matches any known secret pattern."""
try:
content = path.read_bytes()[:_MAX_CONTENT_SAMPLE]
text = content.decode("utf-8", errors="replace")
except OSError:
return False
for pattern in _SECRET_CONTENT_PATTERNS:
if pattern.search(text):
return True
return False
def _contains_secret(path: Path) -> bool:
"""Return True if the file content matches any known secret pattern."""
try:
content = path.read_bytes()[:_MAX_CONTENT_SAMPLE]
text = content.decode("utf-8", errors="replace")
except OSError:
# Fail-safe: treat unreadable files as potentially containing secrets
return True
for pattern in _SECRET_CONTENT_PATTERNS:
if pattern.search(text):
return True
return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/e2b/credential_scanner.py` around lines 123 - 134, The
_contains_secret function currently treats OSError as safe by returning False;
change its behavior to conservatively treat unreadable files as containing
secrets by returning True on OSError (or otherwise flagging the file), so
callers will block or further inspect unreadable files; update the except
OSError block in _contains_secret (which reads path.read_bytes() and references
_MAX_CONTENT_SAMPLE and _SECRET_CONTENT_PATTERNS) to return True and optionally
record/log the error for auditing.

Comment on lines +2641 to +2666
task_id: str = typer.Argument(..., help="Task ID to show run details for"),
workspace_path: Optional[Path] = typer.Option(
None,
"--workspace",
"-w",
help="Workspace path (defaults to current directory)",
),
) -> None:
"""Show detailed run information for a task, including cloud execution cost.

Example:
codeframe work show abc123
codeframe work show abc123 --workspace /path/to/project
"""
from codeframe.core.workspace import get_workspace
from codeframe.core import runtime, engine_stats
from rich.table import Table

path = workspace_path or Path.cwd()

try:
workspace = get_workspace(path)

run = runtime.get_latest_run(workspace, task_id)
if not run:
console.print(f"[dim]No run found for task {task_id}[/dim]")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve partial task IDs before calling get_latest_run().

runtime.get_latest_run() matches on the full task ID, so the common short-ID flow (cf work show abc123) misses even though the other work commands accept prefixes.

🔎 Minimal fix
-    from codeframe.core import runtime, engine_stats
+    from codeframe.core import runtime, engine_stats, tasks as tasks_module
@@
-        run = runtime.get_latest_run(workspace, task_id)
+        matching = [
+            task for task in tasks_module.list_tasks(workspace)
+            if task.id.startswith(task_id)
+        ]
+        if not matching:
+            console.print(f"[dim]No task found matching {task_id}[/dim]")
+            raise typer.Exit(0)
+        if len(matching) > 1:
+            console.print(f"[red]Error:[/red] Multiple tasks match '{task_id}'")
+            raise typer.Exit(1)
+
+        run = runtime.get_latest_run(workspace, matching[0].id)
         if not run:
             console.print(f"[dim]No run found for task {task_id}[/dim]")
             raise typer.Exit(0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/cli/app.py` around lines 2641 - 2666, The CLI handler uses the
user-provided task_id directly with runtime.get_latest_run(), which fails for
short/prefix IDs; after obtaining workspace via get_workspace(path) check for a
run match and if none, expand the prefix by locating the full ID (e.g., call
runtime.list_runs(workspace) or workspace.list_runs() to find the first run.id
that startswith the provided task_id) and then call
runtime.get_latest_run(workspace, full_task_id) using that resolved full ID so
short IDs like "abc123" work the same as other commands.

Comment on lines +2677 to +2679
# Engine stats from run_engine_log
logs = engine_stats.get_run_log(workspace, limit=20)
run_log = next((l for l in logs if l.get("run_id") == run.id), None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Lookup the engine log by run_id, not by scanning the last 20 rows.

Once the selected run is older than the 20 most recent engine records, this command silently drops engine/tokens/cloud cost data. The current generator also keeps CI red on Ruff E741 because of l.

🛠️ Stopgap inside this command
-        logs = engine_stats.get_run_log(workspace, limit=20)
-        run_log = next((l for l in logs if l.get("run_id") == run.id), None)
+        # Prefer a direct engine_stats lookup by run_id; a larger limit is only a stopgap.
+        logs = engine_stats.get_run_log(workspace, limit=1000)
+        run_log = next(
+            (log_entry for log_entry in logs if log_entry.get("run_id") == run.id),
+            None,
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Engine stats from run_engine_log
logs = engine_stats.get_run_log(workspace, limit=20)
run_log = next((l for l in logs if l.get("run_id") == run.id), None)
# Engine stats from run_engine_log
# Prefer a direct engine_stats lookup by run_id; a larger limit is only a stopgap.
logs = engine_stats.get_run_log(workspace, limit=1000)
run_log = next(
(log_entry for log_entry in logs if log_entry.get("run_id") == run.id),
None,
)
🧰 Tools
🪛 GitHub Actions: Test Suite (Unit + E2E)

[error] 2679-2679: ruff check failed (E741): Ambiguous variable name l in generator expression next((l for l in logs if l.get("run_id") == run.id), None).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/cli/app.py` around lines 2677 - 2679, The code currently searches
only the last 20 engine log entries via engine_stats.get_run_log(workspace,
limit=20) and then uses a one-letter iterator l to find run_log, which silently
drops data for older runs and triggers Ruff E741; change the lookup to fetch or
query logs by run_id (e.g., call engine_stats.get_run_log with a
run_id/workspace filter if the API supports it, or page through results until a
log with run.id is found) and replace the ambiguous iterator name l with a
descriptive name like log or entry when computing run_log to fix the E741 lint
issue.

concurrency_by_status: Optional[dict[str, int]] = None,
isolate: bool = True,
isolation: str = "none",
cloud_timeout_minutes: int = 30,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

cloud_timeout_minutes parameter is accepted but never propagated to task execution.

The start_batch function accepts cloud_timeout_minutes but doesn't store it in the BatchRun dataclass or pass it to _execute_task_subprocess. All call sites use the default value (30).

This means the CLI --cloud-timeout option won't have any effect during batch execution.

🐛 Proposed fix
  1. Add cloud_timeout_minutes to the BatchRun dataclass:
`@dataclass`
class BatchRun:
    ...
    cloud_timeout_minutes: int = 30
  1. Store it when creating the batch in start_batch:
     batch = BatchRun(
         ...
         isolation=isolation,
+        cloud_timeout_minutes=cloud_timeout_minutes,
     )
  1. Pass it in all _execute_task_subprocess calls (example from _execute_serial):
     result_status = _execute_task_subprocess(
         workspace, task_id, batch.id, engine=batch.engine,
         stall_timeout_s=batch.stall_timeout_s, stall_action=batch.stall_action,
         worktree_path=...,
+        cloud_timeout_minutes=batch.cloud_timeout_minutes,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/conductor.py` at line 559, The BatchRun cloud timeout is
accepted by start_batch but never used; add a cloud_timeout_minutes:int = 30
field to the BatchRun dataclass, set that field when constructing the BatchRun
instance inside start_batch, and thread that value through to task execution by
passing batch_run.cloud_timeout_minutes into every call to
_execute_task_subprocess (and update signatures where necessary, e.g., in
_execute_serial and any other _execute_* helpers that invoke
_execute_task_subprocess) so the CLI --cloud-timeout value is honored during
batch execution.

…le capture

- Security: move ANTHROPIC_API_KEY to envs= dict in commands.run() — no longer
  interpolated into shell string where it would appear in process list/logs
- Fix: call record_cloud_run() in runtime.py after adapter completes so
  cloud_run_metadata table is populated and cf work show displays cost data
- Fix: switch _download_changed_files to git status --porcelain — captures
  both modified tracked files and newly created untracked files (git diff
  only sees tracked changes)
- Design: move e2b to optional-dependencies [cloud] extra; add ImportError
  guard with helpful install message in adapter
- Add lazy ImportError guard: import e2b inside run() with clear message
  directing users to pip install 'codeframe[cloud]'
- Test: add test_new_files_downloaded_via_porcelain verifying ?? files are
  included; fix mock stdout to use porcelain format; fix lint (E741, F401)
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

WorktreeRegistry — Orphan Cleanup & get_base_branch (Issue #533)

2026-04-04T00:39:21Z

Issue #533 adds three things to the worktree isolation system:

  1. get_base_branch() — reads the actual current branch from git instead of hardcoding "main"
  2. WorktreeRegistry — atomically tracks live worktrees (PID + task_id) so orphaned ones can be detected
  3. Auto-cleanup — _execute_parallel() cleans stale worktrees at batch start; cf env doctor reports them

These complete the worktree isolation loop: create → register → execute → cleanup → unregister.

Acceptance Criterion 1: get_base_branch()

Returns the current branch name from git. Falls back to "main" on failure or in detached HEAD state.

uv run python -c "
from codeframe.core.worktrees import get_base_branch
from pathlib import Path

# Returns real branch name in a git repo
branch = get_base_branch(Path(\".\"))
print(f\"Current branch: {branch!r}\")

# Returns main for non-git directory
import tempfile
with tempfile.TemporaryDirectory() as tmp:
    fallback = get_base_branch(Path(tmp))
    print(f\"Non-git dir fallback: {fallback!r}\")

# Returns main for detached HEAD (simulated)
from unittest.mock import patch, MagicMock
mock = MagicMock(returncode=0, stdout=\"HEAD\n\")
with patch(\"subprocess.run\", return_value=mock):
    detached = get_base_branch(Path(\".\"))
    print(f\"Detached HEAD fallback: {detached!r}\")
"
Current branch: 'feat/worktree-registry-533'
Non-git dir fallback: 'main'
Detached HEAD fallback: 'main'

Acceptance Criterion 2: WorktreeRegistry — register / unregister / list_worktrees

Atomically writes PID + task_id + batch_id to .codeframe/worktrees.json using a threading.Lock and os.replace for crash safety.

uv run python -c "
import tempfile, os, json
from pathlib import Path
from codeframe.core.worktrees import WorktreeRegistry, list_worktrees

with tempfile.TemporaryDirectory() as tmp:
    p = Path(tmp)
    reg = WorktreeRegistry()

    # Register two tasks
    reg.register(p, \"task-001\", \"batch-abc\")
    reg.register(p, \"task-002\", \"batch-abc\")

    entries = list_worktrees(p)
    print(f\"Registered {len(entries)} entries:\")
    for e in entries:
        print(f\"  task={e[chr(39)+task_id+chr(39)]!r}  batch={e[chr(39)+batch_id+chr(39)]!r}  pid={e[chr(39)+pid+chr(39)]}\")

    # Unregister one
    reg.unregister(p, \"task-001\")
    after = list_worktrees(p)
    print(f\"After unregister: {[e[chr(39)+task_id+chr(39)] for e in after]}\")

    # Idempotent — re-registering same task does not duplicate
    reg.register(p, \"task-002\", \"batch-abc\")
    dup = list_worktrees(p)
    print(f\"After re-register (idempotent): {len(dup)} entries (expected 1)\")
"
Traceback (most recent call last):
  File "<string>", line 17, in <module>
    print(f"  task={e[chr(39)+task_id+chr(39)]!r}  batch={e[chr(39)+batch_id+chr(39)]!r}  pid={e[chr(39)+pid+chr(39)]}")
                              ^^^^^^^
NameError: name 'task_id' is not defined
Registered 2 entries:
uv run python /tmp/demo_registry.py
Registered 2 entries:
  task='task-001'  batch='batch-abc'  pid=64664
  task='task-002'  batch='batch-abc'  pid=64664
After unregister: ['task-002']
After re-register (idempotent): 1 entries (expected 1)

Acceptance Criterion 3: Stale detection — list_stale() and cleanup_stale()

A "stale" entry is one whose PID is no longer alive. Checked via os.kill(pid, 0). PermissionError (process alive, different owner) is correctly excluded.

uv run python /tmp/demo_stale.py
Stale entries: ['dead-task']
  (live-task excluded — its pid 64794 is our own process)
After cleanup_stale: ['live-task']

Acceptance Criterion 4: register() is wired into create_execution_context()

When IsolationLevel.WORKTREE is used, the ExecutionContext now registers the PID on creation and unregisters it in the cleanup callback — making the orphan detection actually functional.

grep -A 12 "if isolation == IsolationLevel.WORKTREE:" codeframe/core/sandbox/context.py
    if isolation == IsolationLevel.WORKTREE:
        from codeframe.core.worktrees import TaskWorktree, WorktreeRegistry, get_base_branch

        worktree = TaskWorktree()
        registry = WorktreeRegistry()
        base_branch = get_base_branch(repo_path)
        worktree_path = worktree.create(repo_path, task_id, base_branch=base_branch)
        registry.register(repo_path, task_id, batch_id="unknown")

        def cleanup() -> None:
            worktree.cleanup(repo_path, task_id)
            registry.unregister(repo_path, task_id)

Acceptance Criterion 5: Auto-cleanup in _execute_parallel()

cleanup_stale() is called at the start of every parallel batch run when isolation=worktree — clearing any orphaned worktrees from previously crashed workers before starting new ones.

grep -A 4 "Clean up orphaned worktrees" codeframe/core/conductor.py
    # Clean up orphaned worktrees from crashed workers on previous runs
    from codeframe.core.sandbox.context import IsolationLevel as _IL
    if batch.isolation == _IL.WORKTREE.value:
        from codeframe.core.worktrees import WorktreeRegistry
        WorktreeRegistry().cleanup_stale(workspace.repo_path)

Acceptance Criterion 6: cf env doctor reports stale worktrees

grep -A 14 "Stale worktree check" codeframe/cli/env_commands.py
    # Stale worktree check
    try:
        from codeframe.core.worktrees import WorktreeRegistry
        stale = WorktreeRegistry().list_stale(project_path)
        if stale:
            console.print()
            console.print("[bold yellow]Stale Worktrees:[/bold yellow]")
            for entry in stale:
                console.print(
                    f"  [yellow]⚠[/yellow] task [cyan]{entry['task_id']}[/cyan] "
                    f"(pid {entry.get('pid', '?')} no longer running)"
                )
            console.print()
            console.print(
                "[dim]To clean up, run:[/dim] codeframe work batch run --all-ready "

Acceptance Criterion 7: All 27 tests pass

uv run pytest tests/core/test_worktrees.py -q --tb=short 2>&1 | tail -5
0.02s call     tests/core/test_worktrees.py::TestTaskWorktreeCreate::test_returns_correct_path
0.01s call     tests/core/test_worktrees.py::TestTaskWorktreeCleanup::test_cleanup_nonexistent_does_not_raise
0.01s call     tests/core/test_worktrees.py::TestGetBaseBranch::test_returns_current_branch
0.01s call     tests/core/test_worktrees.py::TestBatchRunIsolate::test_defaults_to_true
============================== 27 passed in 1.02s ==============================

All 27 tests pass. The worktree isolation loop is now complete: create → register (PID) → execute → cleanup → unregister. Stale orphans from crashed workers are detected by PID liveness check and cleaned up at next batch run or reported by cf env doctor.

1 similar comment
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

WorktreeRegistry — Orphan Cleanup & get_base_branch (Issue #533)

2026-04-04T00:39:21Z

Issue #533 adds three things to the worktree isolation system:

  1. get_base_branch() — reads the actual current branch from git instead of hardcoding "main"
  2. WorktreeRegistry — atomically tracks live worktrees (PID + task_id) so orphaned ones can be detected
  3. Auto-cleanup — _execute_parallel() cleans stale worktrees at batch start; cf env doctor reports them

These complete the worktree isolation loop: create → register → execute → cleanup → unregister.

Acceptance Criterion 1: get_base_branch()

Returns the current branch name from git. Falls back to "main" on failure or in detached HEAD state.

uv run python -c "
from codeframe.core.worktrees import get_base_branch
from pathlib import Path

# Returns real branch name in a git repo
branch = get_base_branch(Path(\".\"))
print(f\"Current branch: {branch!r}\")

# Returns main for non-git directory
import tempfile
with tempfile.TemporaryDirectory() as tmp:
    fallback = get_base_branch(Path(tmp))
    print(f\"Non-git dir fallback: {fallback!r}\")

# Returns main for detached HEAD (simulated)
from unittest.mock import patch, MagicMock
mock = MagicMock(returncode=0, stdout=\"HEAD\n\")
with patch(\"subprocess.run\", return_value=mock):
    detached = get_base_branch(Path(\".\"))
    print(f\"Detached HEAD fallback: {detached!r}\")
"
Current branch: 'feat/worktree-registry-533'
Non-git dir fallback: 'main'
Detached HEAD fallback: 'main'

Acceptance Criterion 2: WorktreeRegistry — register / unregister / list_worktrees

Atomically writes PID + task_id + batch_id to .codeframe/worktrees.json using a threading.Lock and os.replace for crash safety.

uv run python -c "
import tempfile, os, json
from pathlib import Path
from codeframe.core.worktrees import WorktreeRegistry, list_worktrees

with tempfile.TemporaryDirectory() as tmp:
    p = Path(tmp)
    reg = WorktreeRegistry()

    # Register two tasks
    reg.register(p, \"task-001\", \"batch-abc\")
    reg.register(p, \"task-002\", \"batch-abc\")

    entries = list_worktrees(p)
    print(f\"Registered {len(entries)} entries:\")
    for e in entries:
        print(f\"  task={e[chr(39)+task_id+chr(39)]!r}  batch={e[chr(39)+batch_id+chr(39)]!r}  pid={e[chr(39)+pid+chr(39)]}\")

    # Unregister one
    reg.unregister(p, \"task-001\")
    after = list_worktrees(p)
    print(f\"After unregister: {[e[chr(39)+task_id+chr(39)] for e in after]}\")

    # Idempotent — re-registering same task does not duplicate
    reg.register(p, \"task-002\", \"batch-abc\")
    dup = list_worktrees(p)
    print(f\"After re-register (idempotent): {len(dup)} entries (expected 1)\")
"
Traceback (most recent call last):
  File "<string>", line 17, in <module>
    print(f"  task={e[chr(39)+task_id+chr(39)]!r}  batch={e[chr(39)+batch_id+chr(39)]!r}  pid={e[chr(39)+pid+chr(39)]}")
                              ^^^^^^^
NameError: name 'task_id' is not defined
Registered 2 entries:
uv run python /tmp/demo_registry.py
Registered 2 entries:
  task='task-001'  batch='batch-abc'  pid=64664
  task='task-002'  batch='batch-abc'  pid=64664
After unregister: ['task-002']
After re-register (idempotent): 1 entries (expected 1)

Acceptance Criterion 3: Stale detection — list_stale() and cleanup_stale()

A "stale" entry is one whose PID is no longer alive. Checked via os.kill(pid, 0). PermissionError (process alive, different owner) is correctly excluded.

uv run python /tmp/demo_stale.py
Stale entries: ['dead-task']
  (live-task excluded — its pid 64794 is our own process)
After cleanup_stale: ['live-task']

Acceptance Criterion 4: register() is wired into create_execution_context()

When IsolationLevel.WORKTREE is used, the ExecutionContext now registers the PID on creation and unregisters it in the cleanup callback — making the orphan detection actually functional.

grep -A 12 "if isolation == IsolationLevel.WORKTREE:" codeframe/core/sandbox/context.py
    if isolation == IsolationLevel.WORKTREE:
        from codeframe.core.worktrees import TaskWorktree, WorktreeRegistry, get_base_branch

        worktree = TaskWorktree()
        registry = WorktreeRegistry()
        base_branch = get_base_branch(repo_path)
        worktree_path = worktree.create(repo_path, task_id, base_branch=base_branch)
        registry.register(repo_path, task_id, batch_id="unknown")

        def cleanup() -> None:
            worktree.cleanup(repo_path, task_id)
            registry.unregister(repo_path, task_id)

Acceptance Criterion 5: Auto-cleanup in _execute_parallel()

cleanup_stale() is called at the start of every parallel batch run when isolation=worktree — clearing any orphaned worktrees from previously crashed workers before starting new ones.

grep -A 4 "Clean up orphaned worktrees" codeframe/core/conductor.py
    # Clean up orphaned worktrees from crashed workers on previous runs
    from codeframe.core.sandbox.context import IsolationLevel as _IL
    if batch.isolation == _IL.WORKTREE.value:
        from codeframe.core.worktrees import WorktreeRegistry
        WorktreeRegistry().cleanup_stale(workspace.repo_path)

Acceptance Criterion 6: cf env doctor reports stale worktrees

grep -A 14 "Stale worktree check" codeframe/cli/env_commands.py
    # Stale worktree check
    try:
        from codeframe.core.worktrees import WorktreeRegistry
        stale = WorktreeRegistry().list_stale(project_path)
        if stale:
            console.print()
            console.print("[bold yellow]Stale Worktrees:[/bold yellow]")
            for entry in stale:
                console.print(
                    f"  [yellow]⚠[/yellow] task [cyan]{entry['task_id']}[/cyan] "
                    f"(pid {entry.get('pid', '?')} no longer running)"
                )
            console.print()
            console.print(
                "[dim]To clean up, run:[/dim] codeframe work batch run --all-ready "

Acceptance Criterion 7: All 27 tests pass

uv run pytest tests/core/test_worktrees.py -q --tb=short 2>&1 | tail -5
0.02s call     tests/core/test_worktrees.py::TestTaskWorktreeCreate::test_returns_correct_path
0.01s call     tests/core/test_worktrees.py::TestTaskWorktreeCleanup::test_cleanup_nonexistent_does_not_raise
0.01s call     tests/core/test_worktrees.py::TestGetBaseBranch::test_returns_current_branch
0.01s call     tests/core/test_worktrees.py::TestBatchRunIsolate::test_defaults_to_true
============================== 27 passed in 1.02s ==============================

All 27 tests pass. The worktree isolation loop is now complete: create → register (PID) → execute → cleanup → unregister. Stale orphans from crashed workers are detected by PID liveness check and cleaned up at next batch run or reported by cf env doctor.

@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Review: E2B Cloud Execution Adapter (#534)

This is a well-structured implementation that correctly follows the existing adapter pattern (matches KilocodeAdapter, claude-code, codex). The security-first credential scan before upload, lazy e2b import, and the switch from git diff to git status --porcelain (catching new untracked files) are all good design choices. Test coverage is comprehensive.

A few issues worth addressing:


Critical: engine_stats import in work show

codeframe.core.engine_stats is not in the repo module list (CLAUDE.md). If this module does not exist, cf work show will fail with ImportError at runtime for every invocation, not just cloud runs. Please verify this module exists or add a guard using try/except around the import.


Bug: prompt parameter is accepted but never used

The run() method accepts a prompt parameter. The docstring says the prompt is written to sandbox as a file but the implementation never does this. The prompt is silently dropped. Either write it to the sandbox before running the agent or remove it from the signature to avoid confusion.


Security: shell command injection via task_id

task_id is interpolated directly into a shell string. Task IDs are typically UUIDs generated internally so the practical risk is low, but a defensive fix costs nothing: wrap with shlex.quote(task_id).


Minor: unused emit parameter in _upload_workspace

emit is never called inside _upload_workspace. Either remove the parameter or use it to surface individual upload warnings instead of just logging them.


Minor: cloud_timeout_minutes threading in start_batch

The diff shows cloud_timeout_minutes added to the start_batch signature and to _execute_task_subprocess, but the diff does not show the call site inside start_batch where it invokes _execute_task_subprocess. Please confirm the parameter is actually threaded through that call or batch runs will silently use the 30-minute default regardless of the CLI flag.


Nit: hardcoded cost constant

_COST_PER_MINUTE = 0.002 — E2B pricing changes occasionally. The comment marks it as an estimate and the column is named cost_usd_estimate so users should understand it is approximate. Acceptable as-is; a future improvement would be a config override.


Summary: The two items most worth fixing before merge are the engine_stats import (breaks cf work show for all users if the module is missing) and the unused prompt parameter (misleading interface). The shell quoting fix is quick and worth doing. Everything else is polish.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
codeframe/adapters/e2b/adapter.py (2)

260-263: ⚠️ Potential issue | 🟠 Major

Sync upload exclusions with credential scanner exclusions.

The _EXCLUDED set is missing directories that scan_path() skips (.tox, dist, build, .eggs). These directories bypass credential scanning but still get uploaded, potentially leaking secrets. Additionally, uploading build artifacts inflates transfer time and sandbox cost.

🔒 Proposed fix
         _EXCLUDED = frozenset({
             "__pycache__", ".git", ".mypy_cache", ".pytest_cache",
-            ".ruff_cache", "node_modules", ".venv", "venv",
+            ".ruff_cache", "node_modules", ".venv", "venv",
+            ".tox", "dist", "build", ".eggs",
         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/e2b/adapter.py` around lines 260 - 263, The _EXCLUDED
frozenset in adapter.py is missing directories that scan_path() skips, causing
those dirs to be uploaded; update the _EXCLUDED definition (the frozenset near
the top of the file) to include ".tox", "dist", "build", and ".eggs" so it
matches scan_path()'s exclusions and prevents uploading build artifacts and
skipped scan directories.

306-339: ⚠️ Potential issue | 🟠 Major

Handle deleted files in porcelain output.

The code parses git status --porcelain but treats all status codes uniformly. When a file is deleted in the sandbox (status " D" or "D "), sbx.files.read() will fail and the local file remains—leaving stale files that the agent intentionally removed.

🧹 Proposed fix to handle deletions
         for line in status_result.stdout.splitlines():
             line = line.strip()
             if not line:
                 continue
             # porcelain format: XY filename (or "XY old -> new" for renames)
             parts = line.split(None, 1)
             if len(parts) < 2:
                 continue
             xy, filepath = parts
             # Handle renames: "R old -> new" — take the new name after " -> "
             if " -> " in filepath:
                 filepath = filepath.split(" -> ", 1)[1]
-            changed.append(filepath.strip())
+            changed.append((xy.strip(), filepath.strip()))

         downloaded = 0
+        deleted = 0
         modified_files: list[str] = []

-        for rel_path in changed:
+        for status, rel_path in changed:
             remote = f"{_SANDBOX_WORKSPACE}/{rel_path}"
             local = workspace_path / rel_path

+            # Handle deletions: remove local file
+            if "D" in status:
+                if local.exists():
+                    local.unlink()
+                    deleted += 1
+                modified_files.append(rel_path)
+                logger.debug("Deleted: %s", rel_path)
+                continue
+
             try:
                 content = sbx.files.read(remote)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/e2b/adapter.py` around lines 306 - 339, The porcelain
parsing currently throws away the XY status so deletions are treated like normal
files; preserve the XY code when building the changed list (e.g., append tuples
like (xy, filepath) or maintain a set of deleted paths by checking if "D"
appears in xy) and in the subsequent loop over changed use that XY info to
handle deletions: if the status indicates deletion (xy contains "D"), do not
call sbx.files.read(), instead remove the local file (use
local.unlink(missing_ok=True) or check exists() and unlink()) and add the path
to modified_files/log a debug message; otherwise continue to read via
sbx.files.read and write with local.write_text/local.write_bytes as before.
Ensure symbols referenced: the parsing that produces changed, variable xy,
rel_path/local, sbx.files.read, modified_files, and
local.write_text/local.write_bytes.
🧹 Nitpick comments (4)
tests/adapters/test_e2b_adapter.py (2)

474-491: Minor: redundant os imports.

Lines 476 and 486 import os locally when it's already imported at module level (line 9). This works but is unnecessary.

💡 Remove redundant imports
     def test_check_requirements_cloud(self):
         from codeframe.core.engine_registry import check_requirements
-        import os

         with patch.dict(os.environ, {}, clear=True):
             result = check_requirements("cloud")
@@
     def test_check_requirements_cloud_with_key(self):
         from codeframe.core.engine_registry import check_requirements
-        import os

         with patch.dict(os.environ, {"E2B_API_KEY": "test-key"}):
             result = check_requirements("cloud")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/adapters/test_e2b_adapter.py` around lines 474 - 491, Remove the
redundant local imports of os inside the test functions
test_check_requirements_cloud and test_check_requirements_cloud_with_key and
rely on the module-level os import already present; simply delete the lines
"import os" inside those functions so the tests continue to use os (e.g., for
patch.dict) without re-importing it locally.

393-398: Consider adding test for minimum timeout clamping.

The test verifies the upper bound (999 → 60) but not the lower bound. The adapter clamps to _MIN_TIMEOUT_MINUTES = 1, which could be tested similarly.

💡 Optional test addition
def test_timeout_clamped_to_min_1(self):
    from codeframe.adapters.e2b.adapter import E2BAgentAdapter

    adapter = E2BAgentAdapter(timeout_minutes=0)
    assert adapter._timeout_minutes == 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/adapters/test_e2b_adapter.py` around lines 393 - 398, Add a unit test
mirroring the max-clamping test to verify the minimum clamp: instantiate
E2BAgentAdapter with timeout_minutes=0 and assert adapter._timeout_minutes
equals the minimum constant (referenced as _MIN_TIMEOUT_MINUTES or the literal
1) so the adapter clamps lower values to the minimum; place the new test (e.g.,
test_timeout_clamped_to_min_1) alongside the existing
test_timeout_clamped_to_max_60 and use the same patching pattern for
e2b.Sandbox.create.
codeframe/adapters/e2b/adapter.py (2)

59-62: Consider documenting optional dependency on ANTHROPIC_API_KEY.

The adapter passes ANTHROPIC_API_KEY to the sandboxed agent (lines 183-185), but requirements() doesn't mention it. While the inner agent may have its own validation, adding it here would surface the requirement earlier and improve UX.

💡 Optional enhancement
     `@classmethod`
     def requirements(cls) -> dict[str, str]:
         """Return required environment variables."""
-        return {"E2B_API_KEY": "E2B API key for cloud sandbox execution"}
+        return {
+            "E2B_API_KEY": "E2B API key for cloud sandbox execution",
+            "ANTHROPIC_API_KEY": "Anthropic API key for the agent inside the sandbox (optional if using other providers)",
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/e2b/adapter.py` around lines 59 - 62, Update the
requirements() classmethod in adapter.py to document the optional
ANTHROPIC_API_KEY environment variable: add an entry "ANTHROPIC_API_KEY":
"Optional: Anthropic API key passed to sandboxed agent for model access" (or
similar) alongside the existing E2B_API_KEY entry so the adapter surfaces this
optional dependency; keep the key name exact and make clear it's optional to
avoid changing runtime validation logic in other methods that pass it into the
sandboxed agent.

164-168: Consider checking git init exit code.

If the git initialization fails (e.g., permission issues), the code continues silently. The subsequent git status --porcelain at download time will fail, but the error won't indicate the root cause was the init step.

💡 Optional improvement
             # Step 4: Initialize git baseline (for diff detection)
-            sbx.commands.run(
+            git_init_result = sbx.commands.run(
                 f"cd {_SANDBOX_WORKSPACE} && git init -q && git add -A && "
                 f"git -c user.email=agent@e2b.local -c user.name=agent commit -q -m init",
                 timeout=30,
             )
+            if git_init_result.exit_code != 0:
+                logger.warning("Git init failed: %s", git_init_result.stderr[:200])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/e2b/adapter.py` around lines 164 - 168, The git init
invocation using sbx.commands.run in the sandbox setup (the call that runs "cd
{_SANDBOX_WORKSPACE} && git init -q && git add -A && git -c
user.email=agent@e2b.local -c user.name=agent commit -q -m init") should check
the command result and surface failures: capture the exit status/output from
sbx.commands.run when calling git init (or run init as a separate command), log
or include stderr in a clear error message, and raise/abort if the init step
fails so subsequent git operations don't obscure the root cause (refer to
sbx.commands.run and the _SANDBOX_WORKSPACE init/commit sequence).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@codeframe/adapters/e2b/adapter.py`:
- Around line 260-263: The _EXCLUDED frozenset in adapter.py is missing
directories that scan_path() skips, causing those dirs to be uploaded; update
the _EXCLUDED definition (the frozenset near the top of the file) to include
".tox", "dist", "build", and ".eggs" so it matches scan_path()'s exclusions and
prevents uploading build artifacts and skipped scan directories.
- Around line 306-339: The porcelain parsing currently throws away the XY status
so deletions are treated like normal files; preserve the XY code when building
the changed list (e.g., append tuples like (xy, filepath) or maintain a set of
deleted paths by checking if "D" appears in xy) and in the subsequent loop over
changed use that XY info to handle deletions: if the status indicates deletion
(xy contains "D"), do not call sbx.files.read(), instead remove the local file
(use local.unlink(missing_ok=True) or check exists() and unlink()) and add the
path to modified_files/log a debug message; otherwise continue to read via
sbx.files.read and write with local.write_text/local.write_bytes as before.
Ensure symbols referenced: the parsing that produces changed, variable xy,
rel_path/local, sbx.files.read, modified_files, and
local.write_text/local.write_bytes.

---

Nitpick comments:
In `@codeframe/adapters/e2b/adapter.py`:
- Around line 59-62: Update the requirements() classmethod in adapter.py to
document the optional ANTHROPIC_API_KEY environment variable: add an entry
"ANTHROPIC_API_KEY": "Optional: Anthropic API key passed to sandboxed agent for
model access" (or similar) alongside the existing E2B_API_KEY entry so the
adapter surfaces this optional dependency; keep the key name exact and make
clear it's optional to avoid changing runtime validation logic in other methods
that pass it into the sandboxed agent.
- Around line 164-168: The git init invocation using sbx.commands.run in the
sandbox setup (the call that runs "cd {_SANDBOX_WORKSPACE} && git init -q && git
add -A && git -c user.email=agent@e2b.local -c user.name=agent commit -q -m
init") should check the command result and surface failures: capture the exit
status/output from sbx.commands.run when calling git init (or run init as a
separate command), log or include stderr in a clear error message, and
raise/abort if the init step fails so subsequent git operations don't obscure
the root cause (refer to sbx.commands.run and the _SANDBOX_WORKSPACE init/commit
sequence).

In `@tests/adapters/test_e2b_adapter.py`:
- Around line 474-491: Remove the redundant local imports of os inside the test
functions test_check_requirements_cloud and
test_check_requirements_cloud_with_key and rely on the module-level os import
already present; simply delete the lines "import os" inside those functions so
the tests continue to use os (e.g., for patch.dict) without re-importing it
locally.
- Around line 393-398: Add a unit test mirroring the max-clamping test to verify
the minimum clamp: instantiate E2BAgentAdapter with timeout_minutes=0 and assert
adapter._timeout_minutes equals the minimum constant (referenced as
_MIN_TIMEOUT_MINUTES or the literal 1) so the adapter clamps lower values to the
minimum; place the new test (e.g., test_timeout_clamped_to_min_1) alongside the
existing test_timeout_clamped_to_max_60 and use the same patching pattern for
e2b.Sandbox.create.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9e2d220a-5034-4147-8104-5d5544f6cee9

📥 Commits

Reviewing files that changed from the base of the PR and between 5905b51 and a9b877f.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • codeframe/adapters/e2b/adapter.py
  • codeframe/cli/app.py
  • codeframe/core/runtime.py
  • pyproject.toml
  • tests/adapters/test_e2b_adapter.py
✅ Files skipped from review due to trivial changes (1)
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (2)
  • codeframe/core/runtime.py
  • codeframe/cli/app.py

CI runs uv sync --extra dev. Moving e2b to optional [cloud] broke the test
suite because @patch('e2b.Sandbox.create') requires the module to be importable.
Adding e2b to [dev] extras restores test coverage in CI without changing the
production install surface (e2b is still not a core dependency).
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Follow-up Review: E2BAgentAdapter

All four issues flagged in my initial review have been addressed in this revision. Quick confirmation:

Fixed ✅

  • API key injectionANTHROPIC_API_KEY is now passed via envs=agent_envs dict, not interpolated into the shell string. Clean.
  • record_cloud_run() never called — The runtime.py block now persists cloud_metadata after the adapter returns. cf work show will populate correctly.
  • New files missed_download_changed_files now uses git status --porcelain, capturing both modified tracked and new untracked files (the test test_new_files_downloaded_via_porcelain covers this).
  • e2b as core dependency — Moved to [project.optional-dependencies] under cloud, with a lazy import guard in the adapter. Optional dev pin keeps CI green.

Two remaining gaps worth noting before merge:

1. Exclusion set mismatch between upload and scan

_upload_workspace in adapter.py excludes 8 directories; credential_scanner.py excludes 12 (adds .tox, dist, build, .eggs). A workspace with a dist/ folder will be scanned-clean but still uploaded. Low severity, but the two sets should stay in sync. Easiest fix: import _EXCLUDED_DIRS from credential_scanner and reuse it in _upload_workspace.

2. Deleted files are silently ignored in _download_changed_files

git status --porcelain reports deleted files with status D. The current loop appends the path to changed unconditionally, then tries sbx.files.read(remote) — which will raise since the file no longer exists in the sandbox. The except swallows the error, so the local file is never removed. Not a correctness blocker for the common case, but if an agent deletes a file as part of its task the local workspace will be left in a stale state.

Minimal fix:

if xy.strip() in ('D', ' D', 'D '):
    local = workspace_path / rel_path
    if local.exists():
        local.unlink()
    modified_files.append(rel_path)
    continue

Both are incremental improvements — the core workflow is sound. If there's a follow-on issue for these, that's acceptable. Otherwise, fixing the exclusion mismatch is a one-liner and worth squeezing in.

@frankbria frankbria merged commit 8d35a21 into main Apr 4, 2026
11 checks passed
@frankbria frankbria deleted the feat/e2b-agent-adapter-534 branch April 4, 2026 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 4.B] E2BAgentAdapter: cloud execution in isolated Linux VM

1 participant