SWE-Bench image builds stuck for 5+ hours, blocking evaluation jobs

## Problem

Three GitHub Actions builds for SWE-Bench images have been stuck for **5+ hours** on the "Build and push SWE-Bench images" step, blocking evaluation jobs from running.

## Affected Builds

All three builds are for SDK commit `217b218272499aa5b17335d82627e090b67cf9aa` (Update orjson to 3.11.7):

| Run ID | Status | Elapsed (reported) | Actual Time Stuck |
|--------|--------|-------------------|-------------------|
| [22627544120](https://github.com/OpenHands/benchmarks/actions/runs/22627544120) | In Progress | 13s | 5+ hours |
| [22627606528](https://github.com/OpenHands/benchmarks/actions/runs/22627606528) | In Progress | 12s | 5+ hours |
| [22627616885](https://github.com/OpenHands/benchmarks/actions/runs/22627616885) | In Progress | 12s | 5+ hours |

## Impact

**Evaluation pods blocked:** Multiple evaluation pods have been waiting 5+ hours for the build to complete:
- `eval-22627492274-gemini-3-1-p9tlk` - swebench (5h22m waiting)
- `eval-22627517186-glm-5-k5xn4` - swebench (5h21m waiting)
- `eval-22627517186-qwen3-5-fl-n9vvw` - swebench (5h21m waiting)

Logs show continuous polling:
```
[2026-03-03 19:52:47 UTC] Benchmarks build run 22627544120: status=in_progress, conclusion=None
```

## Expected Behavior

Normal SWE-Bench image builds complete in **5-40 minutes**:
- Recent successful builds: 5m40s, 8m8s, 10m28s, 24m10s, 40m38s
- Longest recent build: 1h9m20s

## Build Details

**Workflow:** `build-swebench-images.yml`

**Stuck step:**
```yaml
- name: Build and push SWE-Bench images
  run: |
    uv run benchmarks/swebench/build_images.py \
      --dataset '${DATASET}' \
      --split '${SPLIT}' \
      --image ghcr.io/openhands/eval-agent-server \
      --push \
      --max-workers '${MAX_WORKERS}' \
      --max-retries '${MAX_RETRIES}'
```

**Job status shows:**
```
* Build and push SWE-Bench images (still running after 5 hours)
* Archive build logs (queued)
* Upload build logs (queued)
* Display build summary (queued)
```

## Possible Causes

1. **GitHub Actions runner hung/deadlocked** - all 3 builds stuck suggests runner/infrastructure issues
2. **Docker BuildKit deadlock** - parallel image building with `--max-workers` may have deadlocked
3. **Network/registry timeout** - pushing to `ghcr.io` may be timing out silently
4. **Resource exhaustion** - runner out of disk space or memory during build
5. **Silent failure** - build process crashed but runner didn't detect failure

## Investigation Needed

- [ ] Check GitHub Actions runner logs (requires admin access)
- [ ] Verify Docker BuildKit is not deadlocked
- [ ] Check ghcr.io push operations for timeouts
- [ ] Review disk space and memory usage on runners
- [ ] Check if SDK commit 217b2182 introduced any dependency issues

## Recommended Actions

1. **Immediate:** Cancel stuck builds:
   ```bash
   gh run cancel 22627544120
   gh run cancel 22627606528
   gh run cancel 22627616885
   ```

2. **Short-term:** Re-trigger build for SDK commit 217b2182 and monitor closely

3. **Long-term:** 
   - Add timeout to build step (e.g., `timeout: 60` minutes)
   - Add progress logging to `build_images.py`
   - Add health checks during long-running builds
   - Consider build step telemetry/monitoring

## Environment

- **SDK Commit:** 217b2182 (Update orjson to 3.11.7 to address CVE-2025-67221)
- **Workflow:** build-swebench-images.yml
- **Runner:** GitHub-hosted (specific runner unknown due to stuck status)
- **First stuck:** ~2026-03-03 14:30 UTC
- **Affected evaluations:** swebench benchmark

## Additional Context

This is blocking evaluation issue #287 investigation where we're trying to determine if recent SDK changes caused OOM issues. The stuck builds prevent us from running controlled tests with the latest SDK commit.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SWE-Bench image builds stuck for 5+ hours, blocking evaluation jobs #476

Problem

Affected Builds

Impact

Expected Behavior

Build Details

Possible Causes

Investigation Needed

Recommended Actions

Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Run ID	Status	Elapsed (reported)	Actual Time Stuck
22627544120	In Progress	13s	5+ hours
22627606528	In Progress	12s	5+ hours
22627616885	In Progress	12s	5+ hours

SWE-Bench image builds stuck for 5+ hours, blocking evaluation jobs #476

Description

Problem

Affected Builds

Impact

Expected Behavior

Build Details

Possible Causes

Investigation Needed

Recommended Actions

Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions