Skip to content

fix(ci): restore SWT-Bench build parallelism and disable cache export#525

Open
simonrosenberg wants to merge 5 commits intomainfrom
fix/swtbench-build-parallelism
Open

fix(ci): restore SWT-Bench build parallelism and disable cache export#525
simonrosenberg wants to merge 5 commits intomainfrom
fix/swtbench-build-parallelism

Conversation

@simonrosenberg
Copy link
Collaborator

Summary

  • Restore SWT-Bench build parallelism (4 → 16 workers)
  • Increase batch size (15 → 50)
  • Disable cache export to prevent contention

Motivation

SWT-Bench image builds were timing out after 10+ hours due to two issues:

  1. Reduced parallelism: max-workers was reduced from 16 to 4, causing 4x longer build times
  2. Cache export contention: 433 images all exporting to the shared cache registry caused massive GHCR bandwidth contention (~4x slowdown)

Changes

build-swtbench-images.yml

  • max-workers: '4''16'
  • build-batch-size: '15''50'
  • Added OPENHANDS_BUILDKIT_CACHE_MODE: 'off' to disable cache export

Expected Results

Metric Before After
Build time ~10+ hours ~2-3 hours
GHCR bandwidth 433 cache exports 0 cache exports
Cache reads ✅ Preserved ✅ Preserved
Parallelism 4 workers 16 workers

Dependencies

Requires: OpenHands/software-agent-sdk#2479 to be merged first, which adds the OPENHANDS_BUILDKIT_CACHE_MODE environment variable.

Test plan

  • Merge SDK PR first
  • Update vendor/software-agent-sdk submodule to include the fix
  • Run SWT-Bench build and verify ~2-3 hour completion time

Related Issues

🤖 Generated with Claude Code

Restore build performance by:
1. Increasing max-workers: 4 → 16 (4x more parallelism)
2. Increasing build-batch-size: 15 → 50 (~3x larger batches)
3. Setting OPENHANDS_BUILDKIT_CACHE_MODE=off to disable cache export

The cache export contention was causing ~4x slowdown when 433 images
all tried to export to the shared cache registry simultaneously.
Disabling cache export eliminates this contention while still
benefiting from cached layer reads via --cache-from.

Expected results:
- Build time: ~10+ hours → ~2-3 hours
- GHCR bandwidth: 433 cache exports → 0 cache exports
- Cache reads: Still preserved via --cache-from

Requires: OpenHands/software-agent-sdk#2479

Fixes #524
Related: #510, #504

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Pragmatic fix for a real problem, but shipping blind.

The approach is sound: restore parallelism and kill the cache export contention. Simple, direct, solves the actual problem. But you're committing to production without proof it works, and the dependency chain is sketchy.

- Update vendor/software-agent-sdk to include OPENHANDS_BUILDKIT_CACHE_MODE
  feature (commit 54d9c165 from PR #2479)
- Remove redundant OPENHANDS_BUILDKIT_CACHE_MODE from step env since it's
  already set at job level (line 110)

Addresses review feedback:
- SDK submodule now includes the cache mode feature
- Removed duplicate env var definition

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@simonrosenberg simonrosenberg self-assigned this Mar 17, 2026
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable approach, risky execution - The fix is pragmatic (restore parallelism + disable cache export), but you're depending on an unmerged SDK PR. Ship it properly or ship it later.

@@ -1 +1 @@
Subproject commit fc9e8fce2d5a6b5dee4543ff740d0c295aa968a3
Subproject commit 54d9c165a952d56360f5f3f33fed677beb0377e0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Unmerged Dependency: This submodule points to commit 54d9c165 on branch origin/fix/cache-export-contention, which is SDK PR #2479 that has not been merged yet (SDK main is at 58e7ff3b).

Your own PR description says: "Requires: OpenHands/software-agent-sdk#2479 to be merged first"

But you're not following that requirement. This creates several problems:

  1. Unstable dependency: The SDK PR branch could be rebased, force-pushed, or rejected
  2. Broken checkouts: Anyone checking out benchmarks main will have a submodule pointing to a commit that might not exist on SDK main
  3. No guarantee of stability: Unmerged code hasn't passed the full review/merge process

What you should do:

  1. Merge SDK PR #2479 to SDK main first
  2. Update this submodule to point to the merged commit on SDK main
  3. THEN merge this PR

Don't ship dependencies on unmerged branches. That's not how you maintain stable infrastructure.

Comment on lines +103 to 105
MAX_WORKERS: '16'
BUILD_BATCH_SIZE: '50'
N_LIMIT: '0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable - Pragmatic Values: Restoring max-workers: 16 and build-batch-size: 50 based on historical run #23043936501 is reasonable. The math checks out: 4→16 workers gives ~4x speedup, which should bring 10h down to 2-3h.

The approach is simple and solves the real problem (parallelism + cache contention). No complaints about the technical solution.

Comment on lines +108 to +110
# Disable cache export to avoid contention when building many images in parallel
# (SWT-Bench has 433 unique base images, so shared cache provides minimal benefit)
OPENHANDS_BUILDKIT_CACHE_MODE: 'off'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important - No Evidence This Works: You're adding a new environment variable that depends on unmerged SDK code. The PR description claims 2-3 hour build time, but there's zero evidence this specific configuration actually works.

Your test plan checkboxes are unchecked:

- [ ] Merge SDK PR first
- [ ] Update vendor/software-agent-sdk submodule to include the fix  
- [ ] Run SWT-Bench build and verify ~2-3 hour completion time

The historical run you reference (run #23043936501) didn't have OPENHANDS_BUILDKIT_CACHE_MODE because that feature didn't exist yet. So you're extrapolating, not proving.

Before merging:

  1. Merge the SDK PR
  2. Update submodule to merged commit
  3. Run ONE full SWT-Bench build workflow
  4. Add a link to that successful run in the PR description

Don't guess. Prove it works.

Debug Agent and others added 3 commits March 17, 2026 10:53
Update vendor/software-agent-sdk to commit 50a57fbe which fixes the
default cache mode to "max" (preserving existing behavior).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use minimal cache export (mode=min) instead of disabling entirely:
- Exports only final stage layers (not all intermediate layers)
- Reduces bandwidth contention while keeping some cache benefit
- Final SDK layers are shared across all 433 images

This is a better balance than mode=off - less aggressive but still
addresses the bandwidth contention issue.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update vendor/software-agent-sdk to c34cb27b which includes the merged
OPENHANDS_BUILDKIT_CACHE_MODE feature from PR #2479.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SWT-Bench image build exceeded 10-hour timeout (run 23164396524)

2 participants