-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Problem
SWT-bench benchmark images (source-minimal target) bundle three heavy dependency groups that benchmarks never use:
| Dependency | Where declared | Install cost | Used by benchmarks? |
|---|---|---|---|
@zed-industries/claude-agent-acp, @zed-industries/codex-acp |
Dockerfile L88-96 (npm) | ~38s/image | No |
boto3 → botocore |
Dockerfile L34 (--extra boto3) |
~5-10s/image + large install size | No |
browser-use → playwright |
openhands-tools/pyproject.toml L14 (hard dep) | ~15-30s/image + large install size | No |
These add build time, disk footprint (3+ GiB/image), and push time to every benchmark image — for functionality benchmarks don't exercise.
However, these dependencies are critical for other OpenHands users (ACP for Claude Code/Codex agent support, boto3 for Bedrock model discovery, browser-use for browser automation). We cannot simply remove them.
Proposal
Add build-time flags with safe defaults that preserve current behavior for all existing users, while allowing benchmarks to opt out of unused dependencies:
# New build args (in base-image-minimal stage)
ARG INSTALL_ACP=true
ARG INSTALL_BOTO3=true
ARG INSTALL_BROWSER=true- Default
true= identical to today. No user sees any change. - Benchmarks pass
false= lighter images, faster builds.
Dependency-by-dependency analysis
1. npm ACP packages — trivial
The ACP npm packages are installed unconditionally in base-image-minimal (Dockerfile L88-96):
npm install -g @zed-industries/claude-agent-acp @zed-industries/codex-acpACP is architecturally isolated — only loaded when running in ACP server mode. The agent server and benchmark evaluation paths never import it.
Fix: Wrap in a conditional:
ARG INSTALL_ACP=true
RUN set -eux; \
if ! command -v npm >/dev/null 2>&1; then \
curl -fsSL https://deb.nodesource.com/setup_22.x | bash - && \
apt-get install -y --no-install-recommends nodejs && \
rm -rf /var/lib/apt/lists/*; \
fi; \
if [ "$INSTALL_ACP" = "true" ]; then \
npm install -g @zed-industries/claude-agent-acp @zed-industries/codex-acp; \
fi2. boto3/botocore — trivial
boto3 is already an optional extra in openhands-sdk/pyproject.toml L29-30:
[project.optional-dependencies]
boto3 = ["boto3>=1.35.0"]And the runtime already handles its absence gracefully via lazy import in unverified_models.py:
def _get_boto3():
try:
return importlib.import_module("boto3")
except ModuleNotFoundError:
return NoneIf boto3 isn't installed, Bedrock model listing is skipped with a warning. Everything else works fine.
The only reason it's always installed is that the Dockerfile unconditionally passes --extra boto3 (Dockerfile L34):
uv sync --frozen --no-editable --managed-python --extra boto3Fix: Conditionally include the extra:
ARG INSTALL_BOTO3=true
RUN ... uv sync --frozen --no-editable --managed-python $([ "$INSTALL_BOTO3" = "true" ] && echo "--extra boto3")3. browser-use — moderate (but well-positioned)
browser-use>=0.8.0 is currently a hard dependency of openhands-tools (pyproject.toml L14):
dependencies = [
...
"browser-use>=0.8.0",
...
]However, the runtime already treats it as optional. Browser tools are conditionally loaded behind an enable_browser flag in preset/default.py:
if enable_browser:
from openhands.tools.browser_use import BrowserToolSetCLI mode explicitly disables browser tools (enable_browser=not cli_mode). Benchmarks also don't use them.
Fix (two parts):
-
Move
browser-useto an optional extra inopenhands-tools/pyproject.toml:dependencies = [ "openhands-sdk", "bashlex>=0.18", "binaryornot>=0.4.4", "cachetools", "libtmux>=0.53.0", "pydantic>=2.11.7", "func-timeout>=4.3.5", "tom-swe>=1.0.3", ] [project.optional-dependencies] browser = ["browser-use>=0.8.0"]
-
Add a try/except guard in
preset/default.pyfor when the package isn't installed:if enable_browser: try: from openhands.tools.browser_use import BrowserToolSet logger.debug(f"Tool: {BrowserToolSet.name} registered.") except ImportError: logger.warning("browser-use not installed — browser tools unavailable")
-
Add a corresponding Dockerfile build arg and conditionally include
--extra browserinuv sync.
Changes required
SDK repo (software-agent-sdk)
| File | Change | Effort |
|---|---|---|
openhands-agent-server/.../Dockerfile |
Add INSTALL_ACP, INSTALL_BOTO3, INSTALL_BROWSER build args with true defaults; wrap npm ACP install in conditional; conditionally pass --extra boto3 and --extra browser to uv sync |
Small |
openhands-tools/pyproject.toml |
Move browser-use from dependencies to [project.optional-dependencies] browser = [...] |
Small |
openhands-tools/.../preset/default.py |
Add ImportError guard around BrowserToolSet import |
Small |
openhands-agent-server/.../docker/build.py |
Accept and forward new build args | Small |
Benchmarks repo
| File | Change | Effort |
|---|---|---|
benchmarks/utils/build_utils.py |
Pass --build-arg INSTALL_ACP=false --build-arg INSTALL_BOTO3=false --build-arg INSTALL_BROWSER=false for benchmark builds |
Small |
.github/workflows/build-swtbench-images.yml |
Optionally expose the flags as workflow inputs | Small |
Expected impact
| Savings | Per image | At 433 images |
|---|---|---|
| Skip npm ACP install | ~38s | ~4.5 hours |
| Skip browser-use + playwright | ~15-30s install + smaller image | ~2-3 hours |
| Skip boto3/botocore | ~5-10s | ~0.5-1 hour |
| Smaller image → faster export/push | ~10-20s | ~1-2 hours |
Combined with the ARG cache fix from #531 (SDK PR #2522), cold builds for 433 images could drop below 4 hours.
Non-breaking guarantee
- All build args default to
true— existingdocker buildinvocations produce identical images pip install openhands-toolscontinues to work (browser-use becomes an extra, but the Dockerfile includes it by default)- Runtime code already handles missing browser tools and missing boto3 gracefully
- Only benchmark builds explicitly opt out via
--build-arg
Related
- SWT-Bench image building slowness: root cause analysis and fix plan #531 — Root cause analysis of SWT-bench build slowness (ARG ordering fix)
- SDK #2522 — ARG ordering fix (restores registry cache)
- SDK #2465 — PR that added npm ACP install to every image
- Set SWT-bench cache-mode default to max #547 — Re-enable cache-mode=max after ARG fix