Skip to content

Proposal: lightweight benchmark images via optional dependency flags #537

@simonrosenberg

Description

@simonrosenberg

Problem

SWT-bench benchmark images (source-minimal target) bundle three heavy dependency groups that benchmarks never use:

Dependency Where declared Install cost Used by benchmarks?
@zed-industries/claude-agent-acp, @zed-industries/codex-acp Dockerfile L88-96 (npm) ~38s/image No
boto3botocore Dockerfile L34 (--extra boto3) ~5-10s/image + large install size No
browser-useplaywright openhands-tools/pyproject.toml L14 (hard dep) ~15-30s/image + large install size No

These add build time, disk footprint (3+ GiB/image), and push time to every benchmark image — for functionality benchmarks don't exercise.

However, these dependencies are critical for other OpenHands users (ACP for Claude Code/Codex agent support, boto3 for Bedrock model discovery, browser-use for browser automation). We cannot simply remove them.

Proposal

Add build-time flags with safe defaults that preserve current behavior for all existing users, while allowing benchmarks to opt out of unused dependencies:

# New build args (in base-image-minimal stage)
ARG INSTALL_ACP=true
ARG INSTALL_BOTO3=true
ARG INSTALL_BROWSER=true
  • Default true = identical to today. No user sees any change.
  • Benchmarks pass false = lighter images, faster builds.

Dependency-by-dependency analysis

1. npm ACP packages — trivial

The ACP npm packages are installed unconditionally in base-image-minimal (Dockerfile L88-96):

npm install -g @zed-industries/claude-agent-acp @zed-industries/codex-acp

ACP is architecturally isolated — only loaded when running in ACP server mode. The agent server and benchmark evaluation paths never import it.

Fix: Wrap in a conditional:

ARG INSTALL_ACP=true
RUN set -eux; \
    if ! command -v npm >/dev/null 2>&1; then \
        curl -fsSL https://deb.nodesource.com/setup_22.x | bash - && \
        apt-get install -y --no-install-recommends nodejs && \
        rm -rf /var/lib/apt/lists/*; \
    fi; \
    if [ "$INSTALL_ACP" = "true" ]; then \
        npm install -g @zed-industries/claude-agent-acp @zed-industries/codex-acp; \
    fi

2. boto3/botocore — trivial

boto3 is already an optional extra in openhands-sdk/pyproject.toml L29-30:

[project.optional-dependencies]
boto3 = ["boto3>=1.35.0"]

And the runtime already handles its absence gracefully via lazy import in unverified_models.py:

def _get_boto3():
    try:
        return importlib.import_module("boto3")
    except ModuleNotFoundError:
        return None

If boto3 isn't installed, Bedrock model listing is skipped with a warning. Everything else works fine.

The only reason it's always installed is that the Dockerfile unconditionally passes --extra boto3 (Dockerfile L34):

uv sync --frozen --no-editable --managed-python --extra boto3

Fix: Conditionally include the extra:

ARG INSTALL_BOTO3=true
RUN ... uv sync --frozen --no-editable --managed-python $([ "$INSTALL_BOTO3" = "true" ] && echo "--extra boto3")

3. browser-use — moderate (but well-positioned)

browser-use>=0.8.0 is currently a hard dependency of openhands-tools (pyproject.toml L14):

dependencies = [
    ...
    "browser-use>=0.8.0",
    ...
]

However, the runtime already treats it as optional. Browser tools are conditionally loaded behind an enable_browser flag in preset/default.py:

if enable_browser:
    from openhands.tools.browser_use import BrowserToolSet

CLI mode explicitly disables browser tools (enable_browser=not cli_mode). Benchmarks also don't use them.

Fix (two parts):

  1. Move browser-use to an optional extra in openhands-tools/pyproject.toml:

    dependencies = [
        "openhands-sdk",
        "bashlex>=0.18",
        "binaryornot>=0.4.4",
        "cachetools",
        "libtmux>=0.53.0",
        "pydantic>=2.11.7",
        "func-timeout>=4.3.5",
        "tom-swe>=1.0.3",
    ]
    
    [project.optional-dependencies]
    browser = ["browser-use>=0.8.0"]
  2. Add a try/except guard in preset/default.py for when the package isn't installed:

    if enable_browser:
        try:
            from openhands.tools.browser_use import BrowserToolSet
            logger.debug(f"Tool: {BrowserToolSet.name} registered.")
        except ImportError:
            logger.warning("browser-use not installed — browser tools unavailable")
  3. Add a corresponding Dockerfile build arg and conditionally include --extra browser in uv sync.

Changes required

SDK repo (software-agent-sdk)

File Change Effort
openhands-agent-server/.../Dockerfile Add INSTALL_ACP, INSTALL_BOTO3, INSTALL_BROWSER build args with true defaults; wrap npm ACP install in conditional; conditionally pass --extra boto3 and --extra browser to uv sync Small
openhands-tools/pyproject.toml Move browser-use from dependencies to [project.optional-dependencies] browser = [...] Small
openhands-tools/.../preset/default.py Add ImportError guard around BrowserToolSet import Small
openhands-agent-server/.../docker/build.py Accept and forward new build args Small

Benchmarks repo

File Change Effort
benchmarks/utils/build_utils.py Pass --build-arg INSTALL_ACP=false --build-arg INSTALL_BOTO3=false --build-arg INSTALL_BROWSER=false for benchmark builds Small
.github/workflows/build-swtbench-images.yml Optionally expose the flags as workflow inputs Small

Expected impact

Savings Per image At 433 images
Skip npm ACP install ~38s ~4.5 hours
Skip browser-use + playwright ~15-30s install + smaller image ~2-3 hours
Skip boto3/botocore ~5-10s ~0.5-1 hour
Smaller image → faster export/push ~10-20s ~1-2 hours

Combined with the ARG cache fix from #531 (SDK PR #2522), cold builds for 433 images could drop below 4 hours.

Non-breaking guarantee

  • All build args default to true — existing docker build invocations produce identical images
  • pip install openhands-tools continues to work (browser-use becomes an extra, but the Dockerfile includes it by default)
  • Runtime code already handles missing browser tools and missing boto3 gracefully
  • Only benchmark builds explicitly opt out via --build-arg

Related

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions