SWT-bench image build throughput tracker (historical source of truth)

This issue serves as the **historical source of truth** for SWT-bench image build throughput. It tracks every workflow run that built more than 50 images, with careful accounting of truly built vs. skipped images.

## Build throughput over time

| Date | Status | SDK SHA | Benchmarks SHA | Images Built | Images Skipped | Build Duration | Throughput (img/h) | Run |
|------|--------|---------|---------------|-------------|---------------|---------------|-------------------|-----|
| 2026-02-25 | :x: | [`bde31c4`](https://github.com/OpenHands/software-agent-sdk/commit/bde31c4) | [`7d81e21`](https://github.com/OpenHands/benchmarks/commit/7d81e21) | 225 | 0 | 5h57m | 37.8 | [#22410410373](https://github.com/OpenHands/benchmarks/actions/runs/22410410373) |
| 2026-02-27 | :white_check_mark: | [`cefaebf`](https://github.com/OpenHands/software-agent-sdk/commit/cefaebf) | [`744df22`](https://github.com/OpenHands/benchmarks/commit/744df22) | 364 | 69 | 5h29m | 66.4 | [#22508380588](https://github.com/OpenHands/benchmarks/actions/runs/22508380588) |
| 2026-02-27 | :white_check_mark: | [`cefaebf`](https://github.com/OpenHands/software-agent-sdk/commit/cefaebf) | [`744df22`](https://github.com/OpenHands/benchmarks/commit/744df22) | 379 | 54 | 5h12m | 72.9 | [#22508424484](https://github.com/OpenHands/benchmarks/actions/runs/22508424484) |
| 2026-02-27 | :white_check_mark: | [`cefaebf`](https://github.com/OpenHands/software-agent-sdk/commit/cefaebf) | [`744df22`](https://github.com/OpenHands/benchmarks/commit/744df22) | 382 | 51 | 5h03m | 75.6 | [#22508430612](https://github.com/OpenHands/benchmarks/actions/runs/22508430612) |
| 2026-03-04 | :white_check_mark: | [`eab666f`](https://github.com/OpenHands/software-agent-sdk/commit/eab666f) | [`09c9fb1`](https://github.com/OpenHands/benchmarks/commit/09c9fb1) | 50 | 0 | 55m18s | 54.2 | [#22670483404](https://github.com/OpenHands/benchmarks/actions/runs/22670483404) |
| 2026-03-04 | :white_check_mark: | [`3c2bce2`](https://github.com/OpenHands/software-agent-sdk/commit/3c2bce2) | [`863df2a`](https://github.com/OpenHands/benchmarks/commit/863df2a) | 50 | 0 | 1h00m | 50.0 | [#22671307073](https://github.com/OpenHands/benchmarks/actions/runs/22671307073) |
| 2026-03-05 | :yellow_circle: cancelled | [`63565b9`](https://github.com/OpenHands/software-agent-sdk/commit/63565b9) | [`981a95a`](https://github.com/OpenHands/benchmarks/commit/981a95a) | 219 | 18 | 5h58m | 36.7 | [#22715409658](https://github.com/OpenHands/benchmarks/actions/runs/22715409658) |
| 2026-03-06 | :yellow_circle: cancelled | [`8e3e306`](https://github.com/OpenHands/software-agent-sdk/commit/8e3e306) | [`1f29020`](https://github.com/OpenHands/benchmarks/commit/1f29020) | 275 | 0 | 5h59m | 46.0 | [#22747565501](https://github.com/OpenHands/benchmarks/actions/runs/22747565501) |
| 2026-03-11 | :yellow_circle: cancelled | [`3081956`](https://github.com/OpenHands/software-agent-sdk/commit/3081956) | [`4d786be`](https://github.com/OpenHands/benchmarks/commit/4d786be) | 246 | 34 | 5h55m | 41.6 | [#22951682281](https://github.com/OpenHands/benchmarks/actions/runs/22951682281) |
| 2026-03-11 | :white_check_mark: | [`3081956`](https://github.com/OpenHands/software-agent-sdk/commit/3081956) | [`4d786be`](https://github.com/OpenHands/benchmarks/commit/4d786be) | 145 | 288 | 4h08m | 35.1 | [#22967516108](https://github.com/OpenHands/benchmarks/actions/runs/22967516108) |
| 2026-03-11 | :yellow_circle: cancelled | [`46f3d78`](https://github.com/OpenHands/software-agent-sdk/commit/46f3d78) | [`e6f6da4`](https://github.com/OpenHands/benchmarks/commit/e6f6da4) | 262 | 56 | 5h59m | 43.8 | [#22974200370](https://github.com/OpenHands/benchmarks/actions/runs/22974200370) |
| 2026-03-12 | :yellow_circle: cancelled | [`b498a69`](https://github.com/OpenHands/software-agent-sdk/commit/b498a69) | [`36cb585`](https://github.com/OpenHands/benchmarks/commit/36cb585) | 196 | 14 | 3h02m | 64.6 | [#22985176798](https://github.com/OpenHands/benchmarks/actions/runs/22985176798) |
| 2026-03-12 | :yellow_circle: cancelled | [`aa9df69`](https://github.com/OpenHands/software-agent-sdk/commit/aa9df69) | [`49a94d1`](https://github.com/OpenHands/benchmarks/commit/49a94d1) | 112 | 1 | 1h32m | 73.0 | [#23001334237](https://github.com/OpenHands/benchmarks/actions/runs/23001334237) |
| 2026-03-12 | :yellow_circle: cancelled | [`aa9df69`](https://github.com/OpenHands/software-agent-sdk/commit/aa9df69) | [`c614f83`](https://github.com/OpenHands/benchmarks/commit/c614f83) | 59 | 119 | 56m40s | 62.5 | [#23005078511](https://github.com/OpenHands/benchmarks/actions/runs/23005078511) |
| 2026-03-12 | :yellow_circle: cancelled | [`d0c1a39`](https://github.com/OpenHands/software-agent-sdk/commit/d0c1a39) | [`355829e`](https://github.com/OpenHands/benchmarks/commit/355829e) | 158 | 12 | 2h48m | 56.4 | [#23008709458](https://github.com/OpenHands/benchmarks/actions/runs/23008709458) |
| 2026-03-12 | :yellow_circle: cancelled | [`447aa91`](https://github.com/OpenHands/software-agent-sdk/commit/447aa91) | [`22ea7b5`](https://github.com/OpenHands/benchmarks/commit/22ea7b5) | 68 | 0 | 2h55m | 23.3 | [#23028385721](https://github.com/OpenHands/benchmarks/actions/runs/23028385721) |
| 2026-03-13 | :yellow_circle: cancelled | [`8e8223b`](https://github.com/OpenHands/software-agent-sdk/commit/8e8223b) | [`277cb51`](https://github.com/OpenHands/benchmarks/commit/277cb51) | 201 | 0 | 2h54m | 69.3 | [#23033839708](https://github.com/OpenHands/benchmarks/actions/runs/23033839708) |
| 2026-03-13 | :white_check_mark: | [`d94cdf8`](https://github.com/OpenHands/software-agent-sdk/commit/d94cdf8) | [`2579041`](https://github.com/OpenHands/benchmarks/commit/2579041) | 161 | 272 | 2h55m | 55.2 | [#23035485709](https://github.com/OpenHands/benchmarks/actions/runs/23035485709) |
| 2026-03-13 | :white_check_mark: | [`8e8223b`](https://github.com/OpenHands/software-agent-sdk/commit/8e8223b) | [`039aebd`](https://github.com/OpenHands/benchmarks/commit/039aebd) | 230 | 203 | 4h28m | 51.5 | [#23043936501](https://github.com/OpenHands/benchmarks/actions/runs/23043936501) |
| 2026-03-16 | :yellow_circle: cancelled | [`d129025`](https://github.com/OpenHands/software-agent-sdk/commit/d129025) | [`e326a67`](https://github.com/OpenHands/benchmarks/commit/e326a67) | 409 | 0 | 9h56m | 41.2 | [#23164396524](https://github.com/OpenHands/benchmarks/actions/runs/23164396524) |
| 2026-03-17 | :yellow_circle: cancelled | [`c34cb27`](https://github.com/OpenHands/software-agent-sdk/commit/c34cb27) | [`bedaad9`](https://github.com/OpenHands/benchmarks/commit/bedaad9) | 343 | 7 | 9h58m | 34.4 | [#23198836788](https://github.com/OpenHands/benchmarks/actions/runs/23198836788) |
| 2026-03-17 | :yellow_circle: cancelled | [`c34cb27`](https://github.com/OpenHands/software-agent-sdk/commit/c34cb27) | [`76b2dbc`](https://github.com/OpenHands/benchmarks/commit/76b2dbc) | 348 | 1 | 9h57m | 35.0 | [#23198839061](https://github.com/OpenHands/benchmarks/actions/runs/23198839061) |
| 2026-03-17 | :yellow_circle: cancelled | [`d2d5d47`](https://github.com/OpenHands/software-agent-sdk/commit/d2d5d47) | [`e326a67`](https://github.com/OpenHands/benchmarks/commit/e326a67) | 314 | 35 | 9h57m | 31.6 | [#23202392035](https://github.com/OpenHands/benchmarks/actions/runs/23202392035) |
| 2026-03-18 | :white_check_mark: | [`2d027b4`](https://github.com/OpenHands/software-agent-sdk/commit/2d027b4) | [`e2132d1`](https://github.com/OpenHands/benchmarks/commit/e2132d1) | 392 | 41 | 9h20m | 42.0 | [#23226296458](https://github.com/OpenHands/benchmarks/actions/runs/23226296458) |

### Reading the table

- **Images Built** = images that were actually built and pushed to GHCR by this run (excludes skipped).
- **Images Skipped** = images that already existed in GHCR and were not rebuilt.
- **Build Duration** = wall-clock time spent building (from tqdm elapsed or the build summary timer; excludes job setup).
- **Throughput (img/h)** = `images_built / build_duration_hours` — the measured build rate during this run.
- Status: :white_check_mark: completed, :x: failed (hit timeout), :yellow_circle: cancelled (hit timeout or manually cancelled).

**Caveats on throughput comparisons across runs:**
- Throughput is non-linear at scale: runs that build fewer images may not hit disk pressure or BuildKit prune events, inflating their measured throughput relative to full 433-image runs.
- Follow-up runs with high skip counts (e.g., 272/288 skips) only built images missed by prior cancelled runs. They build a biased subset (often the slower-to-build images left over), so their throughput may not be representative.
- Throughput varies within a single run: profiling of run #23382357696 showed 61.3 img/h pre-prune dropping to 25-28 img/h during prune batches.

### Key observations

- **Throughput regressed significantly.** Late February runs (SDK `cefaebf`) measured 66-76 img/h on 364-382 images. Mid-March runs measured 31-42 img/h on 314-409 images.
- **Many cancelled runs:** The job timeout (6h, later increased to 10h) is routinely hit before all images finish, requiring follow-up runs to complete remaining images.
- **One extreme outlier:** Run [#23028385721](https://github.com/OpenHands/benchmarks/actions/runs/23028385721) on 2026-03-12 (SDK `447aa91`) measured only 23.3 img/h on 68 images.
- **Feb 27 parallel runs:** Three runs with the same SDK/benchmarks SHA launched concurrently. The "skipped" images in each were built by sibling runs first. Each independently built 364–382 images.

---

## Methodology

### Why counting "real builds" is non-trivial

The build workflow checks GHCR for each image before building it (`remote_image_exists()` check). If the image already exists, it's skipped. A run can report "Built: 433/433" while only truly building a fraction — the rest were skipped because they were already pushed by a prior or concurrent run. If we don't distinguish, a mostly-skipping run looks blazingly fast when it isn't.

### How real build counts were determined

The log format changed over time, so two different extraction strategies are needed:

**Newer runs (mid-March+):** The build summary line directly reports separate counts:
```
Done in 33633.7s. Built=392 Skipped=41 Failed=0 Retried=6 Throughput=46.3 images/hour
```
The progress bar also distinguishes built from skipped: `🛠 392  ⏭ 41  ❌ 0  🏃 0`

**Older runs (Feb–early March):** The summary lumps everything together:
```
Done. Built=433  Failed=0  Manifest=...
```
The progress bar shows `✅ 433  ❌ 0  🏃 0` where `✅` includes both built AND skipped. However, each skipped image produces a distinct log line:
```
Image ghcr.io/openhands/eval-agent-server:010e847-sweb.eval.x86_64.sympy_1776_sympy-24213-source-minimal already exists. Skipping build.
```
So for old-format runs: `real_built = reported_total - count("already exists. Skipping build.")`. This is verified: the count of skip messages equals the count of unique images skipped (no double-counting).

**Cancelled runs:** The last tqdm progress bar line gives the final built/skipped counts at the time of cancellation.

### How build duration was measured

- **New-format runs:** The exact build process time from the summary (`Done in Xs.`).
- **Old-format and cancelled runs:** The tqdm elapsed time from the last progress bar line (e.g., `[5:29:27<00:00, 45.65s/it]`), which excludes job setup overhead (checkout, docker login, uv install — typically ~1–2 minutes).

---

## Script

Prerequisites: `gh` CLI authenticated with access to `OpenHands/benchmarks`. Run with:
```
python3 analyze_swtbench_builds.py 2>stderr.log > results.json
```

<details>
<summary>analyze_swtbench_builds.py (click to expand)</summary>

```python
#!/usr/bin/env python3
"""Analyze swtbench image build workflow runs to determine real build counts vs skipped.

Downloads CI logs for each run via GitHub API, parses them to distinguish
truly-built images from skipped ones (already in GHCR), and computes
per-image build time and throughput.

Handles two log formats:
- New (mid-March+): summary line has separate Built/Skipped counts
- Old (Feb-early March): summary lumps built+skipped; we count skip log messages
"""

import json
import re
import subprocess
import sys

REPO = "OpenHands/benchmarks"
TOTAL_IMAGES = 433


def run_cmd(cmd, timeout=120):
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
    return result.stdout.strip()


def get_all_runs():
    cmd = f'''gh run list --repo {REPO} --workflow build-swtbench-images.yml --limit 100 \
      --json databaseId,conclusion,createdAt,updatedAt,headSha,displayTitle,status'''
    output = run_cmd(cmd)
    all_runs = json.loads(output)
    return [r for r in all_runs
            if r['conclusion'] not in ('skipped', '') and r['status'] == 'completed']


def get_job_id(run_id):
    cmd = f'gh run view {run_id} --repo {REPO} --json jobs'
    output = run_cmd(cmd)
    if not output:
        return None, None, None
    data = json.loads(output)
    for job in data.get('jobs', []):
        if 'build' in job['name'].lower():
            return job['databaseId'], job.get('startedAt'), job.get('completedAt')
    if data.get('jobs'):
        j = data['jobs'][0]
        return j['databaseId'], j.get('startedAt'), j.get('completedAt')
    return None, None, None


def get_job_logs(job_id):
    cmd = f'gh api repos/{REPO}/actions/jobs/{job_id}/logs'
    return run_cmd(cmd, timeout=60)


def extract_sdk_sha(title):
    m = re.search(r'SDK:\s*([a-f0-9]+)', title)
    return m.group(1) if m else "unknown"


def parse_duration_seconds(start_str, end_str):
    if not start_str or not end_str:
        return None
    try:
        from datetime import datetime
        start = datetime.fromisoformat(start_str.replace('Z', '+00:00'))
        end = datetime.fromisoformat(end_str.replace('Z', '+00:00'))
        return (end - start).total_seconds()
    except Exception:
        return None


def parse_tqdm_elapsed(elapsed_str):
    """Parse tqdm elapsed time like '2:55:21' or '44:16' to seconds."""
    parts = elapsed_str.split(':')
    if len(parts) == 3:
        return int(parts[0]) * 3600 + int(parts[1]) * 60 + int(parts[2])
    elif len(parts) == 2:
        return int(parts[0]) * 60 + int(parts[1])
    return None


def analyze_logs(logs):
    """Analyze logs to determine real built count, skipped count, and build timing.

    Tries four methods in order of preference:
    1. New summary line: "Done in Xs. Built=N Skipped=M Failed=K"
    2. Old summary line + skip message count: "Done. Built=N Failed=M" minus skip count
    3. New progress bar (for cancelled runs): 🛠 N  ⏭ M  ❌ K  🏃 R
    4. Old progress bar (for cancelled runs): ✅ N  ❌ K  🏃 R minus skip count
    """
    result = {
        'real_built': None, 'skipped': None, 'failed': None,
        'build_seconds': None, 'tqdm_elapsed': None, 'method': 'unknown',
    }
    if not logs:
        return result

    # Count "already exists. Skipping build." messages (needed for old format)
    skip_count = len(re.findall(r'already exists\. Skipping build\.', logs))

    # Extract tqdm elapsed time from the LAST progress bar line
    tqdm_matches = re.findall(r'\[(\d+:\d+(?::\d+)?)<', logs)
    if tqdm_matches:
        result['tqdm_elapsed'] = parse_tqdm_elapsed(tqdm_matches[-1])

    # METHOD 1: New format summary
    new_summary = re.search(
        r'Done in ([\d.]+)s\.\s+Built=(\d+)\s+Skipped=(\d+)\s+Failed=(\d+)', logs)
    if new_summary:
        result['build_seconds'] = float(new_summary.group(1))
        result['real_built'] = int(new_summary.group(2))
        result['skipped'] = int(new_summary.group(3))
        result['failed'] = int(new_summary.group(4))
        result['method'] = 'new_summary'
        return result

    # METHOD 2: Old format summary + skip count
    old_summary = re.search(r'Done\.\s+Built=(\d+)\s+Failed=(\d+)', logs)
    if old_summary:
        total_reported = int(old_summary.group(1))
        result['real_built'] = total_reported - skip_count
        result['skipped'] = skip_count
        result['failed'] = int(old_summary.group(2))
        result['method'] = 'old_summary_minus_skips'
        if result['tqdm_elapsed']:
            result['build_seconds'] = result['tqdm_elapsed']
        return result

    # METHOD 3: New progress bar (cancelled runs)
    new_progress = re.findall(r'🛠\s*(\d+)\s+⏭\s*(\d+)\s+❌\s*(\d+)\s+🏃\s*(\d+)', logs)
    if new_progress:
        last = new_progress[-1]
        result['real_built'] = int(last[0])
        result['skipped'] = int(last[1])
        result['failed'] = int(last[2])
        result['method'] = 'new_progress_bar'
        if result['tqdm_elapsed']:
            result['build_seconds'] = result['tqdm_elapsed']
        return result

    # METHOD 4: Old progress bar (cancelled runs)
    old_progress = re.findall(r'✅\s*(\d+)\s+❌\s*(\d+)\s+🏃\s*(\d+)', logs)
    if old_progress:
        last = old_progress[-1]
        result['real_built'] = int(last[0]) - skip_count
        result['skipped'] = skip_count
        result['failed'] = int(last[1])
        result['method'] = 'old_progress_bar_minus_skips'
        if result['tqdm_elapsed']:
            result['build_seconds'] = result['tqdm_elapsed']
        return result

    return result


def format_duration(seconds):
    if seconds is None:
        return "N/A"
    h = int(seconds // 3600)
    m = int((seconds % 3600) // 60)
    s = int(seconds % 60)
    return f"{h}h{m:02d}m" if h > 0 else f"{m}m{s:02d}s"


def main():
    print("Fetching workflow runs...", file=sys.stderr)
    runs = get_all_runs()
    print(f"Found {len(runs)} non-skipped completed runs", file=sys.stderr)

    results = []
    for i, run in enumerate(runs):
        run_id = run['databaseId']
        title = run['displayTitle']
        conclusion = run['conclusion']
        created = run['createdAt']

        # Skip instant failures (< 2 min)
        duration_total = parse_duration_seconds(run['createdAt'], run['updatedAt'])
        if duration_total is not None and duration_total < 120:
            continue

        print(f"  [{i+1}/{len(runs)}] Analyzing run {run_id} ({conclusion})...",
              file=sys.stderr)

        sdk_sha = extract_sdk_sha(title)
        benchmarks_sha = run['headSha'][:7]

        job_id, job_start, job_end = get_job_id(run_id)
        if not job_id:
            continue

        logs = get_job_logs(job_id)
        if not logs:
            continue

        analysis = analyze_logs(logs)

        if analysis['real_built'] is None or analysis['real_built'] < 50:
            print(f"    Built {analysis['real_built']} images (< 50), skipping",
                  file=sys.stderr)
            continue

        build_secs = analysis.get('build_seconds')
        if build_secs is None:
            build_secs = parse_duration_seconds(job_start, job_end)

        real_built = analysis['real_built']
        throughput = None
        if real_built > 0 and build_secs:
            throughput = real_built / (build_secs / 3600)

        results.append({
            'run_id': run_id,
            'date': created[:10],
            'conclusion': conclusion,
            'sdk_sha': sdk_sha[:7],
            'benchmarks_sha': benchmarks_sha,
            'real_built': real_built,
            'skipped': analysis['skipped'],
            'failed': analysis.get('failed', 0),
            'build_seconds': build_secs,
            'build_duration_str': format_duration(build_secs),
            'throughput_img_per_hour': round(throughput, 1) if throughput else None,
            'method': analysis['method'],
        })

        print(f"    Built={real_built} Skipped={analysis['skipped']} "
              f"Duration={format_duration(build_secs)} "
              f"Throughput={round(throughput, 1) if throughput else 'N/A'} img/h "
              f"Method={analysis['method']}", file=sys.stderr)

    results.sort(key=lambda x: x['date'] + str(x['run_id']))
    json.dump(results, sys.stdout, indent=2)


if __name__ == '__main__':
    main()
```

</details>

### Verifying skip counts

To independently verify that skip counts are not double-counted for any run:

```bash
JOB_ID=$(gh run view <RUN_ID> --repo OpenHands/benchmarks --json jobs --jq '.jobs[0].databaseId')
LOGS=$(gh api repos/OpenHands/benchmarks/actions/jobs/$JOB_ID/logs)

# Total skip messages
echo "$LOGS" | grep -c 'already exists. Skipping build.'

# Unique images skipped (should match the above)
echo "$LOGS" | grep 'already exists. Skipping build.' | sed 's/.*Image //' | sed 's/ already.*//' | sort -u | wc -l
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SWT-bench image build throughput tracker (historical source of truth) #530

Build throughput over time

Reading the table

Key observations

Methodology

Why counting "real builds" is non-trivial

How real build counts were determined

How build duration was measured

Script

Verifying skip counts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Date	Status	SDK SHA	Benchmarks SHA	Images Built	Images Skipped	Build Duration	Throughput (img/h)	Run
2026-02-25	❌	`bde31c4`	`7d81e21`	225	0	5h57m	37.8	#22410410373
2026-02-27	✅	`cefaebf`	`744df22`	364	69	5h29m	66.4	#22508380588
2026-02-27	✅	`cefaebf`	`744df22`	379	54	5h12m	72.9	#22508424484
2026-02-27	✅	`cefaebf`	`744df22`	382	51	5h03m	75.6	#22508430612
2026-03-04	✅	`eab666f`	`09c9fb1`	50	0	55m18s	54.2	#22670483404
2026-03-04	✅	`3c2bce2`	`863df2a`	50	0	1h00m	50.0	#22671307073
2026-03-05	🟡 cancelled	`63565b9`	`981a95a`	219	18	5h58m	36.7	#22715409658
2026-03-06	🟡 cancelled	`8e3e306`	`1f29020`	275	0	5h59m	46.0	#22747565501
2026-03-11	🟡 cancelled	`3081956`	`4d786be`	246	34	5h55m	41.6	#22951682281
2026-03-11	✅	`3081956`	`4d786be`	145	288	4h08m	35.1	#22967516108
2026-03-11	🟡 cancelled	`46f3d78`	`e6f6da4`	262	56	5h59m	43.8	#22974200370
2026-03-12	🟡 cancelled	`b498a69`	`36cb585`	196	14	3h02m	64.6	#22985176798
2026-03-12	🟡 cancelled	`aa9df69`	`49a94d1`	112	1	1h32m	73.0	#23001334237
2026-03-12	🟡 cancelled	`aa9df69`	`c614f83`	59	119	56m40s	62.5	#23005078511
2026-03-12	🟡 cancelled	`d0c1a39`	`355829e`	158	12	2h48m	56.4	#23008709458
2026-03-12	🟡 cancelled	`447aa91`	`22ea7b5`	68	0	2h55m	23.3	#23028385721
2026-03-13	🟡 cancelled	`8e8223b`	`277cb51`	201	0	2h54m	69.3	#23033839708
2026-03-13	✅	`d94cdf8`	`2579041`	161	272	2h55m	55.2	#23035485709
2026-03-13	✅	`8e8223b`	`039aebd`	230	203	4h28m	51.5	#23043936501
2026-03-16	🟡 cancelled	`d129025`	`e326a67`	409	0	9h56m	41.2	#23164396524
2026-03-17	🟡 cancelled	`c34cb27`	`bedaad9`	343	7	9h58m	34.4	#23198836788
2026-03-17	🟡 cancelled	`c34cb27`	`76b2dbc`	348	1	9h57m	35.0	#23198839061
2026-03-17	🟡 cancelled	`d2d5d47`	`e326a67`	314	35	9h57m	31.6	#23202392035
2026-03-18	✅	`2d027b4`	`e2132d1`	392	41	9h20m	42.0	#23226296458

SWT-bench image build throughput tracker (historical source of truth) #530

Description

Build throughput over time

Reading the table

Key observations

Methodology

Why counting "real builds" is non-trivial

How real build counts were determined

How build duration was measured

Script

Verifying skip counts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions