fix(ci): restore SWT-Bench build parallelism and disable cache export#525
fix(ci): restore SWT-Bench build parallelism and disable cache export#525simonrosenberg wants to merge 5 commits intomainfrom
Conversation
Restore build performance by: 1. Increasing max-workers: 4 → 16 (4x more parallelism) 2. Increasing build-batch-size: 15 → 50 (~3x larger batches) 3. Setting OPENHANDS_BUILDKIT_CACHE_MODE=off to disable cache export The cache export contention was causing ~4x slowdown when 433 images all tried to export to the shared cache registry simultaneously. Disabling cache export eliminates this contention while still benefiting from cached layer reads via --cache-from. Expected results: - Build time: ~10+ hours → ~2-3 hours - GHCR bandwidth: 433 cache exports → 0 cache exports - Cache reads: Still preserved via --cache-from Requires: OpenHands/software-agent-sdk#2479 Fixes #524 Related: #510, #504 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Pragmatic fix for a real problem, but shipping blind.
The approach is sound: restore parallelism and kill the cache export contention. Simple, direct, solves the actual problem. But you're committing to production without proof it works, and the dependency chain is sketchy.
- Update vendor/software-agent-sdk to include OPENHANDS_BUILDKIT_CACHE_MODE feature (commit 54d9c165 from PR #2479) - Remove redundant OPENHANDS_BUILDKIT_CACHE_MODE from step env since it's already set at job level (line 110) Addresses review feedback: - SDK submodule now includes the cache mode feature - Removed duplicate env var definition Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable approach, risky execution - The fix is pragmatic (restore parallelism + disable cache export), but you're depending on an unmerged SDK PR. Ship it properly or ship it later.
vendor/software-agent-sdk
Outdated
| @@ -1 +1 @@ | |||
| Subproject commit fc9e8fce2d5a6b5dee4543ff740d0c295aa968a3 | |||
| Subproject commit 54d9c165a952d56360f5f3f33fed677beb0377e0 | |||
There was a problem hiding this comment.
🔴 Critical - Unmerged Dependency: This submodule points to commit 54d9c165 on branch origin/fix/cache-export-contention, which is SDK PR #2479 that has not been merged yet (SDK main is at 58e7ff3b).
Your own PR description says: "Requires: OpenHands/software-agent-sdk#2479 to be merged first"
But you're not following that requirement. This creates several problems:
- Unstable dependency: The SDK PR branch could be rebased, force-pushed, or rejected
- Broken checkouts: Anyone checking out benchmarks
mainwill have a submodule pointing to a commit that might not exist on SDK main - No guarantee of stability: Unmerged code hasn't passed the full review/merge process
What you should do:
- Merge SDK PR #2479 to SDK main first
- Update this submodule to point to the merged commit on SDK main
- THEN merge this PR
Don't ship dependencies on unmerged branches. That's not how you maintain stable infrastructure.
| MAX_WORKERS: '16' | ||
| BUILD_BATCH_SIZE: '50' | ||
| N_LIMIT: '0' |
There was a problem hiding this comment.
🟢 Acceptable - Pragmatic Values: Restoring max-workers: 16 and build-batch-size: 50 based on historical run #23043936501 is reasonable. The math checks out: 4→16 workers gives ~4x speedup, which should bring 10h down to 2-3h.
The approach is simple and solves the real problem (parallelism + cache contention). No complaints about the technical solution.
| # Disable cache export to avoid contention when building many images in parallel | ||
| # (SWT-Bench has 433 unique base images, so shared cache provides minimal benefit) | ||
| OPENHANDS_BUILDKIT_CACHE_MODE: 'off' |
There was a problem hiding this comment.
🟠 Important - No Evidence This Works: You're adding a new environment variable that depends on unmerged SDK code. The PR description claims 2-3 hour build time, but there's zero evidence this specific configuration actually works.
Your test plan checkboxes are unchecked:
- [ ] Merge SDK PR first
- [ ] Update vendor/software-agent-sdk submodule to include the fix
- [ ] Run SWT-Bench build and verify ~2-3 hour completion time
The historical run you reference (run #23043936501) didn't have OPENHANDS_BUILDKIT_CACHE_MODE because that feature didn't exist yet. So you're extrapolating, not proving.
Before merging:
- Merge the SDK PR
- Update submodule to merged commit
- Run ONE full SWT-Bench build workflow
- Add a link to that successful run in the PR description
Don't guess. Prove it works.
Update vendor/software-agent-sdk to commit 50a57fbe which fixes the default cache mode to "max" (preserving existing behavior). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use minimal cache export (mode=min) instead of disabling entirely: - Exports only final stage layers (not all intermediate layers) - Reduces bandwidth contention while keeping some cache benefit - Final SDK layers are shared across all 433 images This is a better balance than mode=off - less aggressive but still addresses the bandwidth contention issue. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update vendor/software-agent-sdk to c34cb27b which includes the merged OPENHANDS_BUILDKIT_CACHE_MODE feature from PR #2479. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Motivation
SWT-Bench image builds were timing out after 10+ hours due to two issues:
max-workerswas reduced from 16 to 4, causing 4x longer build timesChanges
build-swtbench-images.ymlmax-workers:'4'→'16'build-batch-size:'15'→'50'OPENHANDS_BUILDKIT_CACHE_MODE: 'off'to disable cache exportExpected Results
Dependencies
Requires: OpenHands/software-agent-sdk#2479 to be merged first, which adds the
OPENHANDS_BUILDKIT_CACHE_MODEenvironment variable.Test plan
Related Issues
🤖 Generated with Claude Code