feat(adapters): E2BAgentAdapter — cloud execution in isolated Linux VM (#534)#540
feat(adapters): E2BAgentAdapter — cloud execution in isolated Linux VM (#534)#540
Conversation
#534) Implements --engine cloud routing through an E2B sandbox for fully isolated task execution without touching the local filesystem. - codeframe/adapters/e2b/adapter.py: E2BAgentAdapter implementing AgentAdapter protocol — credential scan, sandbox create, workspace upload, pip install, agent run, git-diff-based file download, AgentResult with cloud_metadata - codeframe/adapters/e2b/credential_scanner.py: scan_path() blocks .env files, private keys, AWS/OpenAI/GitHub secret patterns before upload - codeframe/adapters/e2b/budget.py: record_cloud_run/get_cloud_run for SQLite cloud_run_metadata table (sandbox_minutes, cost_usd_estimate, file counts) - codeframe/core/engine_registry.py: adds "cloud" to VALID_ENGINES and EXTERNAL_ENGINES; wires get_external_adapter and _get_adapter_class - codeframe/core/adapters/agent_adapter.py: adds cloud_metadata: dict|None field to AgentResult (None by default, no existing code broken) - codeframe/core/runtime.py: adds cloud_timeout_minutes param to execute_agent and passes to get_external_adapter for cloud engine - codeframe/core/workspace.py: adds cloud_run_metadata table to schema (init and migration paths) - codeframe/core/conductor.py: forwards cloud_timeout_minutes through start_batch and _execute_task_subprocess (--cloud-timeout CLI flag) - codeframe/cli/validators.py: adds require_e2b_api_key() following same pattern as require_anthropic/openai_api_key - codeframe/cli/app.py: adds --cloud-timeout option to work start and batch run; adds E2B_API_KEY validation branch; adds `cf work show` command displaying run details and Cloud Execution Cost table for cloud engine - tests/adapters/test_e2b_adapter.py: 35 tests covering credential scanner, budget persistence, adapter lifecycle (mock e2b.Sandbox), engine registry integration, AgentResult field, validators, and workspace schema
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds a new "cloud" engine backed by an E2B adapter: scans workspaces for credentials, uploads files to an E2B sandbox, installs and runs CodeFrame there, streams execution events, downloads changed files, persists per-run budget metadata, wires CLI flags/validation, and adds tests and DB schema for cloud runs. Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI
participant Scanner as CredentialScanner
participant E2B as E2B_Sandbox
participant WS as LocalWorkspace
User->>CLI: cf work start --engine cloud
CLI->>Scanner: scan_path(workspace_path)
alt secrets detected
Scanner-->>CLI: ScanResult(blocked)
CLI-->>User: return failed AgentResult (credential_scan_blocked)
else clean
CLI->>E2B: create Sandbox(timeout_minutes)
E2B-->>CLI: sandbox instance
CLI->>E2B: upload workspace files
CLI->>E2B: pip install codeframe
CLI->>E2B: run "cf work start <task_id> --execute"
E2B->>CLI: stream stdout/stderr (progress events)
alt execution success
CLI->>E2B: git status --porcelain
E2B-->>CLI: list modified/created files
CLI->>E2B: download changed files
E2B-->>CLI: file contents
CLI->>WS: write files back
CLI-->>User: return success AgentResult + cloud_metadata
else execution failure
CLI-->>User: return failed AgentResult + error
end
CLI->>E2B: sbx.kill()
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Review: E2BAgentAdapterWell-structured PR. The layered design (credential scan -> sandbox -> upload -> run -> download) is clear, and test coverage is solid at 35 tests. A few issues worth fixing before merge: Critical: API key leaked in process list adapter.py, agent command construction (~line 196): The key is visible in /proc//cmdline on the sandbox host, in shell history, and in any logging of the command string. Use the E2B SDK's environment dict parameter (if supported), or write the key to a temp env file inside the sandbox that is cleaned up afterward. At minimum, avoid interpolating secrets into shell strings. Bug: record_cloud_run() is never called — cf work show table always empty budget.py provides record_cloud_run(), and cf work show reads from it via get_cloud_run(). But there is no call to record_cloud_run() anywhere in the runtime.py diff after the adapter completes. The AgentResult.cloud_metadata dict is returned by the adapter but never persisted to the DB. The Cloud Execution Cost table in cf work show will always be empty for real runs. Fix: after the external adapter returns in runtime.py's execute path, check agent_result.cloud_metadata and call record_cloud_run() if present. Bug: New files created by agent are missed _download_changed_files() uses: This only shows modifications to already-tracked files. New files the agent creates are untracked and silently skipped. Replace with git status --porcelain (or chain git ls-files --others --exclude-standard) to capture both modified and new files. Design: e2b should be an optional dependency e2b>=2.0.0 is added to core dependencies in pyproject.toml, pulling in protobuf, dockerfile-parse, wcmatch, etc. for all users regardless of whether they use cloud execution. It should go under [project.optional-dependencies] (e.g., cloud = ["e2b>=2.0.0"]) with a lazy import guard in the adapter. The lazy import in engine_registry.py is already structured correctly — the pyproject entry is the inconsistency. Minor: Credential scanner false-positive risk The content pattern (?i)(api_key|secret|password)\s*=\s*['"][^'"]{8,}['"] will block config files or test fixtures that set a local dev password (e.g., password = "localdev123"). Consider scoping content scanning to specific file types (.py, .yaml, .json, .toml) or documenting the false-positive risk. What's solid
The three bugs above (API key leakage, missing record_cloud_run() call, untracked file gap) need fixes before this is production-ready. |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
codeframe/core/runtime.py (1)
740-751:⚠️ Potential issue | 🟠 MajorPersist
result.cloud_metadatabefore returning.
E2BAgentAdapter.run()only hands budget data back in-memory. In the directcf work start --engine cloudpath, nothing writes that payload tocloud_run_metadata, so the newcf work showcommand has no cost row to display.💾 Minimal fix
result = wrapper.run( run.task_id, packaged.prompt, effective_repo_path, on_event=on_adapter_event, ) + if engine == "cloud" and result.cloud_metadata: + from codeframe.adapters.e2b.budget import record_cloud_run + + record_cloud_run( + workspace=workspace, + run_id=run.id, + sandbox_minutes=float(result.cloud_metadata.get("sandbox_minutes", 0.0)), + cost_usd=float(result.cloud_metadata.get("cost_usd_estimate", 0.0)), + files_uploaded=int(result.cloud_metadata.get("files_uploaded", 0)), + files_downloaded=int(result.cloud_metadata.get("files_downloaded", 0)), + scan_blocked=int(result.cloud_metadata.get("credential_scan_blocked", 0)), + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/core/runtime.py` around lines 740 - 751, The wrapper.run() result may include in-memory budget info in result.cloud_metadata that never gets persisted; after calling wrapper.run(...) check if result.cloud_metadata is present and store it on the run object as run.cloud_run_metadata = result.cloud_metadata, then persist that change to the workspace (use the repository's existing run-save/update helper on workspace, e.g., workspace.update_run or workspace.save_run) before returning so cf work show can read the cost row.
🧹 Nitpick comments (1)
codeframe/adapters/e2b/credential_scanner.py (1)
33-55: Consider expanding secret detection patterns for broader coverage.The current patterns cover AWS, OpenAI, and GitHub, but miss other common credential types that could leak:
- Anthropic API keys (
sk-ant-*)- Azure/Google Cloud credentials
- Generic bearer tokens
.envrcfiles (used by direnv)♻️ Suggested additions
_BLOCKED_FILENAME_PATTERNS = ( re.compile(r"^\.env$"), re.compile(r"^\.env\."), + re.compile(r"^\.envrc$"), re.compile(r"\.pem$"), ... ) _SECRET_CONTENT_PATTERNS = ( re.compile(r"AKIA[0-9A-Z]{16}"), # AWS access key re.compile(r"sk-[a-zA-Z0-9]{48}"), # OpenAI API key + re.compile(r"sk-ant-[a-zA-Z0-9-]{80,}"), # Anthropic API key re.compile(r"ghp_[a-zA-Z0-9]{36}"), # GitHub PAT re.compile(r"ghs_[a-zA-Z0-9]{36}"), # GitHub app token + re.compile(r"Bearer\s+[a-zA-Z0-9_-]{20,}"), # Generic bearer token re.compile(r"(?i)(api_key|secret|password)\s*=\s*['\"][^'\"]{8,}['\"]"), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/adapters/e2b/credential_scanner.py` around lines 33 - 55, Update the credential_scanner patterns: extend _BLOCKED_FILENAME_PATTERNS to include a re for ".envrc" (so direnv files are blocked) and extend _SECRET_CONTENT_PATTERNS to add regexes for Anthropic keys (e.g., sk-ant- prefix), Azure/Google Cloud credential indicators (e.g., "azure_.*", "GOOGLE_APPLICATION_CREDENTIALS", service account JSON patterns with "type\": \"service_account\""), and generic bearer tokens (e.g., "Bearer\s+[A-Za-z0-9\-\._~\+\/]+=*"). Locate and modify the tuples named _BLOCKED_FILENAME_PATTERNS and _SECRET_CONTENT_PATTERNS in credential_scanner.py to append these compiled patterns ensuring case-insensitivity where appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@codeframe/adapters/e2b/adapter.py`:
- Around line 241-248: The _EXCLUDED set used by _upload_workspace() is missing
entries that scan_path() excludes, causing mismatched behavior; update the
_EXCLUDED frozenset (used in _upload_workspace()) to include the same
directories as scan_path() (add ".tox", "dist", "build", and ".eggs") so uploads
mirror scanning exclusions and keep the two functions in lockstep.
- Around line 276-310: The code currently runs sbx.commands.run("git diff
--name-only HEAD") and always attempts sbx.files.read on every path, which fails
to delete files that were removed upstream; change the git invocation to "git
diff --name-status HEAD" then parse each output line into (status, rel_path),
and in the for loop handle deletions: if status == "D" remove the local file
(local.unlink() if exists), append rel_path to modified_files and increment
downloaded (or count deletions separately), log the deletion and skip
sbx.files.read; for non-deleted statuses continue to read and write as now
(using sbx.files.read), keeping existing logger.warning on read errors and the
emit("progress", ...) behavior. Ensure you reference sbx.commands.run,
sbx.files.read, modified_files, downloaded, and logger when implementing.
In `@codeframe/adapters/e2b/credential_scanner.py`:
- Around line 123-134: The _contains_secret function currently treats OSError as
safe by returning False; change its behavior to conservatively treat unreadable
files as containing secrets by returning True on OSError (or otherwise flagging
the file), so callers will block or further inspect unreadable files; update the
except OSError block in _contains_secret (which reads path.read_bytes() and
references _MAX_CONTENT_SAMPLE and _SECRET_CONTENT_PATTERNS) to return True and
optionally record/log the error for auditing.
In `@codeframe/cli/app.py`:
- Around line 2677-2679: The code currently searches only the last 20 engine log
entries via engine_stats.get_run_log(workspace, limit=20) and then uses a
one-letter iterator l to find run_log, which silently drops data for older runs
and triggers Ruff E741; change the lookup to fetch or query logs by run_id
(e.g., call engine_stats.get_run_log with a run_id/workspace filter if the API
supports it, or page through results until a log with run.id is found) and
replace the ambiguous iterator name l with a descriptive name like log or entry
when computing run_log to fix the E741 lint issue.
- Around line 2641-2666: The CLI handler uses the user-provided task_id directly
with runtime.get_latest_run(), which fails for short/prefix IDs; after obtaining
workspace via get_workspace(path) check for a run match and if none, expand the
prefix by locating the full ID (e.g., call runtime.list_runs(workspace) or
workspace.list_runs() to find the first run.id that startswith the provided
task_id) and then call runtime.get_latest_run(workspace, full_task_id) using
that resolved full ID so short IDs like "abc123" work the same as other
commands.
In `@codeframe/core/conductor.py`:
- Line 559: The BatchRun cloud timeout is accepted by start_batch but never
used; add a cloud_timeout_minutes:int = 30 field to the BatchRun dataclass, set
that field when constructing the BatchRun instance inside start_batch, and
thread that value through to task execution by passing
batch_run.cloud_timeout_minutes into every call to _execute_task_subprocess (and
update signatures where necessary, e.g., in _execute_serial and any other
_execute_* helpers that invoke _execute_task_subprocess) so the CLI
--cloud-timeout value is honored during batch execution.
In `@tests/adapters/test_e2b_adapter.py`:
- Around line 9-13: The test module imports include unused names causing lint
failures: remove tempfile from the top-level imports and remove call from the
unittest.mock import list in tests/adapters/test_e2b_adapter.py so only actually
used imports remain (e.g., keep os, sqlite3, Path, MagicMock, patch). Update the
import statements in that file accordingly to eliminate the unused-tempfile and
unused-call warnings.
---
Outside diff comments:
In `@codeframe/core/runtime.py`:
- Around line 740-751: The wrapper.run() result may include in-memory budget
info in result.cloud_metadata that never gets persisted; after calling
wrapper.run(...) check if result.cloud_metadata is present and store it on the
run object as run.cloud_run_metadata = result.cloud_metadata, then persist that
change to the workspace (use the repository's existing run-save/update helper on
workspace, e.g., workspace.update_run or workspace.save_run) before returning so
cf work show can read the cost row.
---
Nitpick comments:
In `@codeframe/adapters/e2b/credential_scanner.py`:
- Around line 33-55: Update the credential_scanner patterns: extend
_BLOCKED_FILENAME_PATTERNS to include a re for ".envrc" (so direnv files are
blocked) and extend _SECRET_CONTENT_PATTERNS to add regexes for Anthropic keys
(e.g., sk-ant- prefix), Azure/Google Cloud credential indicators (e.g.,
"azure_.*", "GOOGLE_APPLICATION_CREDENTIALS", service account JSON patterns with
"type\": \"service_account\""), and generic bearer tokens (e.g.,
"Bearer\s+[A-Za-z0-9\-\._~\+\/]+=*"). Locate and modify the tuples named
_BLOCKED_FILENAME_PATTERNS and _SECRET_CONTENT_PATTERNS in credential_scanner.py
to append these compiled patterns ensuring case-insensitivity where appropriate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fffd8df3-b429-4a98-ba4f-5d6e9e02346e
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
codeframe/adapters/e2b/__init__.pycodeframe/adapters/e2b/adapter.pycodeframe/adapters/e2b/budget.pycodeframe/adapters/e2b/credential_scanner.pycodeframe/cli/app.pycodeframe/cli/validators.pycodeframe/core/adapters/agent_adapter.pycodeframe/core/conductor.pycodeframe/core/engine_registry.pycodeframe/core/runtime.pycodeframe/core/workspace.pypyproject.tomltests/adapters/test_e2b_adapter.py
| _EXCLUDED = frozenset({ | ||
| "__pycache__", ".git", ".mypy_cache", ".pytest_cache", | ||
| ".ruff_cache", "node_modules", ".venv", "venv", | ||
| }) | ||
|
|
||
| uploaded = 0 | ||
| for path in sorted(workspace_path.rglob("*")): | ||
| if any(part in _EXCLUDED for part in path.parts): |
There was a problem hiding this comment.
Keep upload exclusions in lockstep with the credential scanner.
scan_path() skips .tox, dist, build, and .eggs, but _upload_workspace() still uploads them. That breaks the “scan before upload” guarantee and can also balloon transfer time and sandbox cost.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/adapters/e2b/adapter.py` around lines 241 - 248, The _EXCLUDED set
used by _upload_workspace() is missing entries that scan_path() excludes,
causing mismatched behavior; update the _EXCLUDED frozenset (used in
_upload_workspace()) to include the same directories as scan_path() (add ".tox",
"dist", "build", and ".eggs") so uploads mirror scanning exclusions and keep the
two functions in lockstep.
codeframe/adapters/e2b/adapter.py
Outdated
| diff_result = sbx.commands.run( | ||
| f"cd {_SANDBOX_WORKSPACE} && git diff --name-only HEAD", | ||
| timeout=30, | ||
| ) | ||
|
|
||
| if diff_result.exit_code != 0 or not diff_result.stdout.strip(): | ||
| return [], 0 | ||
|
|
||
| changed = [ | ||
| line.strip() | ||
| for line in diff_result.stdout.splitlines() | ||
| if line.strip() | ||
| ] | ||
|
|
||
| downloaded = 0 | ||
| modified_files: list[str] = [] | ||
|
|
||
| for rel_path in changed: | ||
| remote = f"{_SANDBOX_WORKSPACE}/{rel_path}" | ||
| local = workspace_path / rel_path | ||
|
|
||
| try: | ||
| content = sbx.files.read(remote) | ||
| local.parent.mkdir(parents=True, exist_ok=True) | ||
| if isinstance(content, str): | ||
| local.write_text(content, encoding="utf-8") | ||
| else: | ||
| local.write_bytes(bytes(content)) | ||
| modified_files.append(rel_path) | ||
| downloaded += 1 | ||
| logger.debug("Downloaded: %s", rel_path) | ||
| except Exception as exc: | ||
| logger.warning("Failed to download %s: %s", rel_path, exc) | ||
|
|
||
| emit("progress", f"Downloaded {downloaded} changed file(s)") |
There was a problem hiding this comment.
Mirror deletions back to the local workspace.
git diff --name-only HEAD includes deleted paths, but this loop always tries to read() them. A cloud run that deletes files will return success while leaving stale files behind locally.
🧹 Minimal fix
diff_result = sbx.commands.run(
- f"cd {_SANDBOX_WORKSPACE} && git diff --name-only HEAD",
+ f"cd {_SANDBOX_WORKSPACE} && git diff --name-status HEAD",
timeout=30,
)
@@
- changed = [
- line.strip()
- for line in diff_result.stdout.splitlines()
- if line.strip()
- ]
-
downloaded = 0
modified_files: list[str] = []
- for rel_path in changed:
+ for line in diff_result.stdout.splitlines():
+ if not line.strip():
+ continue
+ status, *paths = line.split("\t")
+ rel_path = paths[-1]
remote = f"{_SANDBOX_WORKSPACE}/{rel_path}"
local = workspace_path / rel_path
try:
+ if status == "D":
+ if local.exists():
+ local.unlink()
+ modified_files.append(rel_path)
+ downloaded += 1
+ continue
+
content = sbx.files.read(remote)
local.parent.mkdir(parents=True, exist_ok=True)
if isinstance(content, str):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/adapters/e2b/adapter.py` around lines 276 - 310, The code currently
runs sbx.commands.run("git diff --name-only HEAD") and always attempts
sbx.files.read on every path, which fails to delete files that were removed
upstream; change the git invocation to "git diff --name-status HEAD" then parse
each output line into (status, rel_path), and in the for loop handle deletions:
if status == "D" remove the local file (local.unlink() if exists), append
rel_path to modified_files and increment downloaded (or count deletions
separately), log the deletion and skip sbx.files.read; for non-deleted statuses
continue to read and write as now (using sbx.files.read), keeping existing
logger.warning on read errors and the emit("progress", ...) behavior. Ensure you
reference sbx.commands.run, sbx.files.read, modified_files, downloaded, and
logger when implementing.
| def _contains_secret(path: Path) -> bool: | ||
| """Return True if the file content matches any known secret pattern.""" | ||
| try: | ||
| content = path.read_bytes()[:_MAX_CONTENT_SAMPLE] | ||
| text = content.decode("utf-8", errors="replace") | ||
| except OSError: | ||
| return False | ||
|
|
||
| for pattern in _SECRET_CONTENT_PATTERNS: | ||
| if pattern.search(text): | ||
| return True | ||
| return False |
There was a problem hiding this comment.
OSError should be treated as potentially containing secrets, not as safe.
When a file cannot be read (OSError), returning False treats it as "no secrets found." For a security-focused scanner, the safer default is to assume unreadable files might contain secrets and block them.
🛡️ Proposed fix
try:
content = path.read_bytes()[:_MAX_CONTENT_SAMPLE]
text = content.decode("utf-8", errors="replace")
except OSError:
- return False
+ # Fail-safe: treat unreadable files as potentially containing secrets
+ return True
for pattern in _SECRET_CONTENT_PATTERNS:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _contains_secret(path: Path) -> bool: | |
| """Return True if the file content matches any known secret pattern.""" | |
| try: | |
| content = path.read_bytes()[:_MAX_CONTENT_SAMPLE] | |
| text = content.decode("utf-8", errors="replace") | |
| except OSError: | |
| return False | |
| for pattern in _SECRET_CONTENT_PATTERNS: | |
| if pattern.search(text): | |
| return True | |
| return False | |
| def _contains_secret(path: Path) -> bool: | |
| """Return True if the file content matches any known secret pattern.""" | |
| try: | |
| content = path.read_bytes()[:_MAX_CONTENT_SAMPLE] | |
| text = content.decode("utf-8", errors="replace") | |
| except OSError: | |
| # Fail-safe: treat unreadable files as potentially containing secrets | |
| return True | |
| for pattern in _SECRET_CONTENT_PATTERNS: | |
| if pattern.search(text): | |
| return True | |
| return False |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/adapters/e2b/credential_scanner.py` around lines 123 - 134, The
_contains_secret function currently treats OSError as safe by returning False;
change its behavior to conservatively treat unreadable files as containing
secrets by returning True on OSError (or otherwise flagging the file), so
callers will block or further inspect unreadable files; update the except
OSError block in _contains_secret (which reads path.read_bytes() and references
_MAX_CONTENT_SAMPLE and _SECRET_CONTENT_PATTERNS) to return True and optionally
record/log the error for auditing.
| task_id: str = typer.Argument(..., help="Task ID to show run details for"), | ||
| workspace_path: Optional[Path] = typer.Option( | ||
| None, | ||
| "--workspace", | ||
| "-w", | ||
| help="Workspace path (defaults to current directory)", | ||
| ), | ||
| ) -> None: | ||
| """Show detailed run information for a task, including cloud execution cost. | ||
|
|
||
| Example: | ||
| codeframe work show abc123 | ||
| codeframe work show abc123 --workspace /path/to/project | ||
| """ | ||
| from codeframe.core.workspace import get_workspace | ||
| from codeframe.core import runtime, engine_stats | ||
| from rich.table import Table | ||
|
|
||
| path = workspace_path or Path.cwd() | ||
|
|
||
| try: | ||
| workspace = get_workspace(path) | ||
|
|
||
| run = runtime.get_latest_run(workspace, task_id) | ||
| if not run: | ||
| console.print(f"[dim]No run found for task {task_id}[/dim]") |
There was a problem hiding this comment.
Resolve partial task IDs before calling get_latest_run().
runtime.get_latest_run() matches on the full task ID, so the common short-ID flow (cf work show abc123) misses even though the other work commands accept prefixes.
🔎 Minimal fix
- from codeframe.core import runtime, engine_stats
+ from codeframe.core import runtime, engine_stats, tasks as tasks_module
@@
- run = runtime.get_latest_run(workspace, task_id)
+ matching = [
+ task for task in tasks_module.list_tasks(workspace)
+ if task.id.startswith(task_id)
+ ]
+ if not matching:
+ console.print(f"[dim]No task found matching {task_id}[/dim]")
+ raise typer.Exit(0)
+ if len(matching) > 1:
+ console.print(f"[red]Error:[/red] Multiple tasks match '{task_id}'")
+ raise typer.Exit(1)
+
+ run = runtime.get_latest_run(workspace, matching[0].id)
if not run:
console.print(f"[dim]No run found for task {task_id}[/dim]")
raise typer.Exit(0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/cli/app.py` around lines 2641 - 2666, The CLI handler uses the
user-provided task_id directly with runtime.get_latest_run(), which fails for
short/prefix IDs; after obtaining workspace via get_workspace(path) check for a
run match and if none, expand the prefix by locating the full ID (e.g., call
runtime.list_runs(workspace) or workspace.list_runs() to find the first run.id
that startswith the provided task_id) and then call
runtime.get_latest_run(workspace, full_task_id) using that resolved full ID so
short IDs like "abc123" work the same as other commands.
codeframe/cli/app.py
Outdated
| # Engine stats from run_engine_log | ||
| logs = engine_stats.get_run_log(workspace, limit=20) | ||
| run_log = next((l for l in logs if l.get("run_id") == run.id), None) |
There was a problem hiding this comment.
Lookup the engine log by run_id, not by scanning the last 20 rows.
Once the selected run is older than the 20 most recent engine records, this command silently drops engine/tokens/cloud cost data. The current generator also keeps CI red on Ruff E741 because of l.
🛠️ Stopgap inside this command
- logs = engine_stats.get_run_log(workspace, limit=20)
- run_log = next((l for l in logs if l.get("run_id") == run.id), None)
+ # Prefer a direct engine_stats lookup by run_id; a larger limit is only a stopgap.
+ logs = engine_stats.get_run_log(workspace, limit=1000)
+ run_log = next(
+ (log_entry for log_entry in logs if log_entry.get("run_id") == run.id),
+ None,
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Engine stats from run_engine_log | |
| logs = engine_stats.get_run_log(workspace, limit=20) | |
| run_log = next((l for l in logs if l.get("run_id") == run.id), None) | |
| # Engine stats from run_engine_log | |
| # Prefer a direct engine_stats lookup by run_id; a larger limit is only a stopgap. | |
| logs = engine_stats.get_run_log(workspace, limit=1000) | |
| run_log = next( | |
| (log_entry for log_entry in logs if log_entry.get("run_id") == run.id), | |
| None, | |
| ) |
🧰 Tools
🪛 GitHub Actions: Test Suite (Unit + E2E)
[error] 2679-2679: ruff check failed (E741): Ambiguous variable name l in generator expression next((l for l in logs if l.get("run_id") == run.id), None).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/cli/app.py` around lines 2677 - 2679, The code currently searches
only the last 20 engine log entries via engine_stats.get_run_log(workspace,
limit=20) and then uses a one-letter iterator l to find run_log, which silently
drops data for older runs and triggers Ruff E741; change the lookup to fetch or
query logs by run_id (e.g., call engine_stats.get_run_log with a
run_id/workspace filter if the API supports it, or page through results until a
log with run.id is found) and replace the ambiguous iterator name l with a
descriptive name like log or entry when computing run_log to fix the E741 lint
issue.
| concurrency_by_status: Optional[dict[str, int]] = None, | ||
| isolate: bool = True, | ||
| isolation: str = "none", | ||
| cloud_timeout_minutes: int = 30, |
There was a problem hiding this comment.
cloud_timeout_minutes parameter is accepted but never propagated to task execution.
The start_batch function accepts cloud_timeout_minutes but doesn't store it in the BatchRun dataclass or pass it to _execute_task_subprocess. All call sites use the default value (30).
This means the CLI --cloud-timeout option won't have any effect during batch execution.
🐛 Proposed fix
- Add
cloud_timeout_minutesto theBatchRundataclass:
`@dataclass`
class BatchRun:
...
cloud_timeout_minutes: int = 30- Store it when creating the batch in
start_batch:
batch = BatchRun(
...
isolation=isolation,
+ cloud_timeout_minutes=cloud_timeout_minutes,
)- Pass it in all
_execute_task_subprocesscalls (example from_execute_serial):
result_status = _execute_task_subprocess(
workspace, task_id, batch.id, engine=batch.engine,
stall_timeout_s=batch.stall_timeout_s, stall_action=batch.stall_action,
worktree_path=...,
+ cloud_timeout_minutes=batch.cloud_timeout_minutes,
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/core/conductor.py` at line 559, The BatchRun cloud timeout is
accepted by start_batch but never used; add a cloud_timeout_minutes:int = 30
field to the BatchRun dataclass, set that field when constructing the BatchRun
instance inside start_batch, and thread that value through to task execution by
passing batch_run.cloud_timeout_minutes into every call to
_execute_task_subprocess (and update signatures where necessary, e.g., in
_execute_serial and any other _execute_* helpers that invoke
_execute_task_subprocess) so the CLI --cloud-timeout value is honored during
batch execution.
…le capture - Security: move ANTHROPIC_API_KEY to envs= dict in commands.run() — no longer interpolated into shell string where it would appear in process list/logs - Fix: call record_cloud_run() in runtime.py after adapter completes so cloud_run_metadata table is populated and cf work show displays cost data - Fix: switch _download_changed_files to git status --porcelain — captures both modified tracked files and newly created untracked files (git diff only sees tracked changes) - Design: move e2b to optional-dependencies [cloud] extra; add ImportError guard with helpful install message in adapter - Add lazy ImportError guard: import e2b inside run() with clear message directing users to pip install 'codeframe[cloud]' - Test: add test_new_files_downloaded_via_porcelain verifying ?? files are included; fix mock stdout to use porcelain format; fix lint (E741, F401)
WorktreeRegistry — Orphan Cleanup & get_base_branch (Issue #533)2026-04-04T00:39:21Z Issue #533 adds three things to the worktree isolation system:
These complete the worktree isolation loop: create → register → execute → cleanup → unregister. Acceptance Criterion 1: get_base_branch()Returns the current branch name from git. Falls back to "main" on failure or in detached HEAD state. uv run python -c "
from codeframe.core.worktrees import get_base_branch
from pathlib import Path
# Returns real branch name in a git repo
branch = get_base_branch(Path(\".\"))
print(f\"Current branch: {branch!r}\")
# Returns main for non-git directory
import tempfile
with tempfile.TemporaryDirectory() as tmp:
fallback = get_base_branch(Path(tmp))
print(f\"Non-git dir fallback: {fallback!r}\")
# Returns main for detached HEAD (simulated)
from unittest.mock import patch, MagicMock
mock = MagicMock(returncode=0, stdout=\"HEAD\n\")
with patch(\"subprocess.run\", return_value=mock):
detached = get_base_branch(Path(\".\"))
print(f\"Detached HEAD fallback: {detached!r}\")
"Acceptance Criterion 2: WorktreeRegistry — register / unregister / list_worktreesAtomically writes PID + task_id + batch_id to .codeframe/worktrees.json using a threading.Lock and os.replace for crash safety. uv run python -c "
import tempfile, os, json
from pathlib import Path
from codeframe.core.worktrees import WorktreeRegistry, list_worktrees
with tempfile.TemporaryDirectory() as tmp:
p = Path(tmp)
reg = WorktreeRegistry()
# Register two tasks
reg.register(p, \"task-001\", \"batch-abc\")
reg.register(p, \"task-002\", \"batch-abc\")
entries = list_worktrees(p)
print(f\"Registered {len(entries)} entries:\")
for e in entries:
print(f\" task={e[chr(39)+task_id+chr(39)]!r} batch={e[chr(39)+batch_id+chr(39)]!r} pid={e[chr(39)+pid+chr(39)]}\")
# Unregister one
reg.unregister(p, \"task-001\")
after = list_worktrees(p)
print(f\"After unregister: {[e[chr(39)+task_id+chr(39)] for e in after]}\")
# Idempotent — re-registering same task does not duplicate
reg.register(p, \"task-002\", \"batch-abc\")
dup = list_worktrees(p)
print(f\"After re-register (idempotent): {len(dup)} entries (expected 1)\")
"uv run python /tmp/demo_registry.pyAcceptance Criterion 3: Stale detection — list_stale() and cleanup_stale()A "stale" entry is one whose PID is no longer alive. Checked via os.kill(pid, 0). PermissionError (process alive, different owner) is correctly excluded. uv run python /tmp/demo_stale.pyAcceptance Criterion 4: register() is wired into create_execution_context()When IsolationLevel.WORKTREE is used, the ExecutionContext now registers the PID on creation and unregisters it in the cleanup callback — making the orphan detection actually functional. grep -A 12 "if isolation == IsolationLevel.WORKTREE:" codeframe/core/sandbox/context.pyAcceptance Criterion 5: Auto-cleanup in _execute_parallel()cleanup_stale() is called at the start of every parallel batch run when isolation=worktree — clearing any orphaned worktrees from previously crashed workers before starting new ones. grep -A 4 "Clean up orphaned worktrees" codeframe/core/conductor.pyAcceptance Criterion 6: cf env doctor reports stale worktreesgrep -A 14 "Stale worktree check" codeframe/cli/env_commands.pyAcceptance Criterion 7: All 27 tests passuv run pytest tests/core/test_worktrees.py -q --tb=short 2>&1 | tail -5All 27 tests pass. The worktree isolation loop is now complete: create → register (PID) → execute → cleanup → unregister. Stale orphans from crashed workers are detected by PID liveness check and cleaned up at next batch run or reported by cf env doctor. |
1 similar comment
WorktreeRegistry — Orphan Cleanup & get_base_branch (Issue #533)2026-04-04T00:39:21Z Issue #533 adds three things to the worktree isolation system:
These complete the worktree isolation loop: create → register → execute → cleanup → unregister. Acceptance Criterion 1: get_base_branch()Returns the current branch name from git. Falls back to "main" on failure or in detached HEAD state. uv run python -c "
from codeframe.core.worktrees import get_base_branch
from pathlib import Path
# Returns real branch name in a git repo
branch = get_base_branch(Path(\".\"))
print(f\"Current branch: {branch!r}\")
# Returns main for non-git directory
import tempfile
with tempfile.TemporaryDirectory() as tmp:
fallback = get_base_branch(Path(tmp))
print(f\"Non-git dir fallback: {fallback!r}\")
# Returns main for detached HEAD (simulated)
from unittest.mock import patch, MagicMock
mock = MagicMock(returncode=0, stdout=\"HEAD\n\")
with patch(\"subprocess.run\", return_value=mock):
detached = get_base_branch(Path(\".\"))
print(f\"Detached HEAD fallback: {detached!r}\")
"Acceptance Criterion 2: WorktreeRegistry — register / unregister / list_worktreesAtomically writes PID + task_id + batch_id to .codeframe/worktrees.json using a threading.Lock and os.replace for crash safety. uv run python -c "
import tempfile, os, json
from pathlib import Path
from codeframe.core.worktrees import WorktreeRegistry, list_worktrees
with tempfile.TemporaryDirectory() as tmp:
p = Path(tmp)
reg = WorktreeRegistry()
# Register two tasks
reg.register(p, \"task-001\", \"batch-abc\")
reg.register(p, \"task-002\", \"batch-abc\")
entries = list_worktrees(p)
print(f\"Registered {len(entries)} entries:\")
for e in entries:
print(f\" task={e[chr(39)+task_id+chr(39)]!r} batch={e[chr(39)+batch_id+chr(39)]!r} pid={e[chr(39)+pid+chr(39)]}\")
# Unregister one
reg.unregister(p, \"task-001\")
after = list_worktrees(p)
print(f\"After unregister: {[e[chr(39)+task_id+chr(39)] for e in after]}\")
# Idempotent — re-registering same task does not duplicate
reg.register(p, \"task-002\", \"batch-abc\")
dup = list_worktrees(p)
print(f\"After re-register (idempotent): {len(dup)} entries (expected 1)\")
"uv run python /tmp/demo_registry.pyAcceptance Criterion 3: Stale detection — list_stale() and cleanup_stale()A "stale" entry is one whose PID is no longer alive. Checked via os.kill(pid, 0). PermissionError (process alive, different owner) is correctly excluded. uv run python /tmp/demo_stale.pyAcceptance Criterion 4: register() is wired into create_execution_context()When IsolationLevel.WORKTREE is used, the ExecutionContext now registers the PID on creation and unregisters it in the cleanup callback — making the orphan detection actually functional. grep -A 12 "if isolation == IsolationLevel.WORKTREE:" codeframe/core/sandbox/context.pyAcceptance Criterion 5: Auto-cleanup in _execute_parallel()cleanup_stale() is called at the start of every parallel batch run when isolation=worktree — clearing any orphaned worktrees from previously crashed workers before starting new ones. grep -A 4 "Clean up orphaned worktrees" codeframe/core/conductor.pyAcceptance Criterion 6: cf env doctor reports stale worktreesgrep -A 14 "Stale worktree check" codeframe/cli/env_commands.pyAcceptance Criterion 7: All 27 tests passuv run pytest tests/core/test_worktrees.py -q --tb=short 2>&1 | tail -5All 27 tests pass. The worktree isolation loop is now complete: create → register (PID) → execute → cleanup → unregister. Stale orphans from crashed workers are detected by PID liveness check and cleaned up at next batch run or reported by cf env doctor. |
Review: E2B Cloud Execution Adapter (#534)This is a well-structured implementation that correctly follows the existing adapter pattern (matches KilocodeAdapter, claude-code, codex). The security-first credential scan before upload, lazy e2b import, and the switch from A few issues worth addressing: Critical:
Bug: The Security: shell command injection via
Minor: unused
Minor: The diff shows Nit: hardcoded cost constant
Summary: The two items most worth fixing before merge are the |
There was a problem hiding this comment.
♻️ Duplicate comments (2)
codeframe/adapters/e2b/adapter.py (2)
260-263:⚠️ Potential issue | 🟠 MajorSync upload exclusions with credential scanner exclusions.
The
_EXCLUDEDset is missing directories thatscan_path()skips (.tox,dist,build,.eggs). These directories bypass credential scanning but still get uploaded, potentially leaking secrets. Additionally, uploading build artifacts inflates transfer time and sandbox cost.🔒 Proposed fix
_EXCLUDED = frozenset({ "__pycache__", ".git", ".mypy_cache", ".pytest_cache", - ".ruff_cache", "node_modules", ".venv", "venv", + ".ruff_cache", "node_modules", ".venv", "venv", + ".tox", "dist", "build", ".eggs", })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/adapters/e2b/adapter.py` around lines 260 - 263, The _EXCLUDED frozenset in adapter.py is missing directories that scan_path() skips, causing those dirs to be uploaded; update the _EXCLUDED definition (the frozenset near the top of the file) to include ".tox", "dist", "build", and ".eggs" so it matches scan_path()'s exclusions and prevents uploading build artifacts and skipped scan directories.
306-339:⚠️ Potential issue | 🟠 MajorHandle deleted files in porcelain output.
The code parses
git status --porcelainbut treats all status codes uniformly. When a file is deleted in the sandbox (status" D"or"D "),sbx.files.read()will fail and the local file remains—leaving stale files that the agent intentionally removed.🧹 Proposed fix to handle deletions
for line in status_result.stdout.splitlines(): line = line.strip() if not line: continue # porcelain format: XY filename (or "XY old -> new" for renames) parts = line.split(None, 1) if len(parts) < 2: continue xy, filepath = parts # Handle renames: "R old -> new" — take the new name after " -> " if " -> " in filepath: filepath = filepath.split(" -> ", 1)[1] - changed.append(filepath.strip()) + changed.append((xy.strip(), filepath.strip())) downloaded = 0 + deleted = 0 modified_files: list[str] = [] - for rel_path in changed: + for status, rel_path in changed: remote = f"{_SANDBOX_WORKSPACE}/{rel_path}" local = workspace_path / rel_path + # Handle deletions: remove local file + if "D" in status: + if local.exists(): + local.unlink() + deleted += 1 + modified_files.append(rel_path) + logger.debug("Deleted: %s", rel_path) + continue + try: content = sbx.files.read(remote)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/adapters/e2b/adapter.py` around lines 306 - 339, The porcelain parsing currently throws away the XY status so deletions are treated like normal files; preserve the XY code when building the changed list (e.g., append tuples like (xy, filepath) or maintain a set of deleted paths by checking if "D" appears in xy) and in the subsequent loop over changed use that XY info to handle deletions: if the status indicates deletion (xy contains "D"), do not call sbx.files.read(), instead remove the local file (use local.unlink(missing_ok=True) or check exists() and unlink()) and add the path to modified_files/log a debug message; otherwise continue to read via sbx.files.read and write with local.write_text/local.write_bytes as before. Ensure symbols referenced: the parsing that produces changed, variable xy, rel_path/local, sbx.files.read, modified_files, and local.write_text/local.write_bytes.
🧹 Nitpick comments (4)
tests/adapters/test_e2b_adapter.py (2)
474-491: Minor: redundantosimports.Lines 476 and 486 import
oslocally when it's already imported at module level (line 9). This works but is unnecessary.💡 Remove redundant imports
def test_check_requirements_cloud(self): from codeframe.core.engine_registry import check_requirements - import os with patch.dict(os.environ, {}, clear=True): result = check_requirements("cloud") @@ def test_check_requirements_cloud_with_key(self): from codeframe.core.engine_registry import check_requirements - import os with patch.dict(os.environ, {"E2B_API_KEY": "test-key"}): result = check_requirements("cloud")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/adapters/test_e2b_adapter.py` around lines 474 - 491, Remove the redundant local imports of os inside the test functions test_check_requirements_cloud and test_check_requirements_cloud_with_key and rely on the module-level os import already present; simply delete the lines "import os" inside those functions so the tests continue to use os (e.g., for patch.dict) without re-importing it locally.
393-398: Consider adding test for minimum timeout clamping.The test verifies the upper bound (999 → 60) but not the lower bound. The adapter clamps to
_MIN_TIMEOUT_MINUTES = 1, which could be tested similarly.💡 Optional test addition
def test_timeout_clamped_to_min_1(self): from codeframe.adapters.e2b.adapter import E2BAgentAdapter adapter = E2BAgentAdapter(timeout_minutes=0) assert adapter._timeout_minutes == 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/adapters/test_e2b_adapter.py` around lines 393 - 398, Add a unit test mirroring the max-clamping test to verify the minimum clamp: instantiate E2BAgentAdapter with timeout_minutes=0 and assert adapter._timeout_minutes equals the minimum constant (referenced as _MIN_TIMEOUT_MINUTES or the literal 1) so the adapter clamps lower values to the minimum; place the new test (e.g., test_timeout_clamped_to_min_1) alongside the existing test_timeout_clamped_to_max_60 and use the same patching pattern for e2b.Sandbox.create.codeframe/adapters/e2b/adapter.py (2)
59-62: Consider documenting optional dependency onANTHROPIC_API_KEY.The adapter passes
ANTHROPIC_API_KEYto the sandboxed agent (lines 183-185), butrequirements()doesn't mention it. While the inner agent may have its own validation, adding it here would surface the requirement earlier and improve UX.💡 Optional enhancement
`@classmethod` def requirements(cls) -> dict[str, str]: """Return required environment variables.""" - return {"E2B_API_KEY": "E2B API key for cloud sandbox execution"} + return { + "E2B_API_KEY": "E2B API key for cloud sandbox execution", + "ANTHROPIC_API_KEY": "Anthropic API key for the agent inside the sandbox (optional if using other providers)", + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/adapters/e2b/adapter.py` around lines 59 - 62, Update the requirements() classmethod in adapter.py to document the optional ANTHROPIC_API_KEY environment variable: add an entry "ANTHROPIC_API_KEY": "Optional: Anthropic API key passed to sandboxed agent for model access" (or similar) alongside the existing E2B_API_KEY entry so the adapter surfaces this optional dependency; keep the key name exact and make clear it's optional to avoid changing runtime validation logic in other methods that pass it into the sandboxed agent.
164-168: Consider checking git init exit code.If the git initialization fails (e.g., permission issues), the code continues silently. The subsequent
git status --porcelainat download time will fail, but the error won't indicate the root cause was the init step.💡 Optional improvement
# Step 4: Initialize git baseline (for diff detection) - sbx.commands.run( + git_init_result = sbx.commands.run( f"cd {_SANDBOX_WORKSPACE} && git init -q && git add -A && " f"git -c user.email=agent@e2b.local -c user.name=agent commit -q -m init", timeout=30, ) + if git_init_result.exit_code != 0: + logger.warning("Git init failed: %s", git_init_result.stderr[:200])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/adapters/e2b/adapter.py` around lines 164 - 168, The git init invocation using sbx.commands.run in the sandbox setup (the call that runs "cd {_SANDBOX_WORKSPACE} && git init -q && git add -A && git -c user.email=agent@e2b.local -c user.name=agent commit -q -m init") should check the command result and surface failures: capture the exit status/output from sbx.commands.run when calling git init (or run init as a separate command), log or include stderr in a clear error message, and raise/abort if the init step fails so subsequent git operations don't obscure the root cause (refer to sbx.commands.run and the _SANDBOX_WORKSPACE init/commit sequence).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@codeframe/adapters/e2b/adapter.py`:
- Around line 260-263: The _EXCLUDED frozenset in adapter.py is missing
directories that scan_path() skips, causing those dirs to be uploaded; update
the _EXCLUDED definition (the frozenset near the top of the file) to include
".tox", "dist", "build", and ".eggs" so it matches scan_path()'s exclusions and
prevents uploading build artifacts and skipped scan directories.
- Around line 306-339: The porcelain parsing currently throws away the XY status
so deletions are treated like normal files; preserve the XY code when building
the changed list (e.g., append tuples like (xy, filepath) or maintain a set of
deleted paths by checking if "D" appears in xy) and in the subsequent loop over
changed use that XY info to handle deletions: if the status indicates deletion
(xy contains "D"), do not call sbx.files.read(), instead remove the local file
(use local.unlink(missing_ok=True) or check exists() and unlink()) and add the
path to modified_files/log a debug message; otherwise continue to read via
sbx.files.read and write with local.write_text/local.write_bytes as before.
Ensure symbols referenced: the parsing that produces changed, variable xy,
rel_path/local, sbx.files.read, modified_files, and
local.write_text/local.write_bytes.
---
Nitpick comments:
In `@codeframe/adapters/e2b/adapter.py`:
- Around line 59-62: Update the requirements() classmethod in adapter.py to
document the optional ANTHROPIC_API_KEY environment variable: add an entry
"ANTHROPIC_API_KEY": "Optional: Anthropic API key passed to sandboxed agent for
model access" (or similar) alongside the existing E2B_API_KEY entry so the
adapter surfaces this optional dependency; keep the key name exact and make
clear it's optional to avoid changing runtime validation logic in other methods
that pass it into the sandboxed agent.
- Around line 164-168: The git init invocation using sbx.commands.run in the
sandbox setup (the call that runs "cd {_SANDBOX_WORKSPACE} && git init -q && git
add -A && git -c user.email=agent@e2b.local -c user.name=agent commit -q -m
init") should check the command result and surface failures: capture the exit
status/output from sbx.commands.run when calling git init (or run init as a
separate command), log or include stderr in a clear error message, and
raise/abort if the init step fails so subsequent git operations don't obscure
the root cause (refer to sbx.commands.run and the _SANDBOX_WORKSPACE init/commit
sequence).
In `@tests/adapters/test_e2b_adapter.py`:
- Around line 474-491: Remove the redundant local imports of os inside the test
functions test_check_requirements_cloud and
test_check_requirements_cloud_with_key and rely on the module-level os import
already present; simply delete the lines "import os" inside those functions so
the tests continue to use os (e.g., for patch.dict) without re-importing it
locally.
- Around line 393-398: Add a unit test mirroring the max-clamping test to verify
the minimum clamp: instantiate E2BAgentAdapter with timeout_minutes=0 and assert
adapter._timeout_minutes equals the minimum constant (referenced as
_MIN_TIMEOUT_MINUTES or the literal 1) so the adapter clamps lower values to the
minimum; place the new test (e.g., test_timeout_clamped_to_min_1) alongside the
existing test_timeout_clamped_to_max_60 and use the same patching pattern for
e2b.Sandbox.create.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9e2d220a-5034-4147-8104-5d5544f6cee9
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
codeframe/adapters/e2b/adapter.pycodeframe/cli/app.pycodeframe/core/runtime.pypyproject.tomltests/adapters/test_e2b_adapter.py
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (2)
- codeframe/core/runtime.py
- codeframe/cli/app.py
CI runs uv sync --extra dev. Moving e2b to optional [cloud] broke the test suite because @patch('e2b.Sandbox.create') requires the module to be importable. Adding e2b to [dev] extras restores test coverage in CI without changing the production install surface (e2b is still not a core dependency).
Follow-up Review: E2BAgentAdapterAll four issues flagged in my initial review have been addressed in this revision. Quick confirmation: Fixed ✅
Two remaining gaps worth noting before merge: 1. Exclusion set mismatch between upload and scan
2. Deleted files are silently ignored in
Minimal fix: if xy.strip() in ('D', ' D', 'D '):
local = workspace_path / rel_path
if local.exists():
local.unlink()
modified_files.append(rel_path)
continueBoth are incremental improvements — the core workflow is sound. If there's a follow-on issue for these, that's acceptable. Otherwise, fixing the exclusion mismatch is a one-liner and worth squeezing in. |
Summary
Implements
--engine cloudrouting through an E2B Linux sandbox for fully isolated task execution without touching the local filesystem. Closes #534.codeframe/adapters/e2b/adapter.py—E2BAgentAdapterimplementingAgentAdapterprotocol: credential scan → sandbox create → workspace upload → codeframe install → agent run → git-diff download →AgentResultwithcloud_metadatacodeframe/adapters/e2b/credential_scanner.py—scan_path()blocks.envfiles, private keys, AWS/OpenAI/GitHub secret patterns before uploadcodeframe/adapters/e2b/budget.py—record_cloud_run/get_cloud_runforcloud_run_metadataSQLite table tracking sandbox minutes and costcodeframe/core/engine_registry.py— adds"cloud"toVALID_ENGINESandEXTERNAL_ENGINES; wires factory functionscodeframe/core/adapters/agent_adapter.py— addscloud_metadata: dict | None = NonetoAgentResult(non-breaking)codeframe/core/runtime.py— addscloud_timeout_minutesparam forwarded toget_external_adaptercodeframe/core/workspace.py— addscloud_run_metadatatable to schema (init + migration)codeframe/core/conductor.py— forwardscloud_timeout_minutesthroughstart_batchand_execute_task_subprocesscodeframe/cli/validators.py— addsrequire_e2b_api_key()following existing validator patterncodeframe/cli/app.py—--cloud-timeoutoption onwork startandbatch run; E2B key validation; newcf work showcommand with Cloud Execution Cost tabletests/adapters/test_e2b_adapter.py— 35 tests: credential scanner (11), budget persistence (3), adapter lifecycle with mockede2b.Sandbox(7), engine registry (8),AgentResultfield (2), validators (2), workspace schema (2)Test plan
uv run pytest tests/adapters/test_e2b_adapter.py— 35/35 passuv run pytest tests/core/test_agent_adapter.py tests/core/test_engine_registry.py— 44/44 passuv run pytest tests/core/test_workspace.py— 17/17 passuv run pytest tests/adapters/test_llm.py— all pass (no regressions)cd web-ui && npm test— 680/680 passuv run ruff checkon all changed files — cleancf work start <id> --execute --engine cloud --cloud-timeout 30(requiresE2B_API_KEY)cf work show <id>displays Cloud Execution Cost table after cloud runSummary by CodeRabbit
New Features
Chores
Tests