-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Problem
Three GitHub Actions builds for SWE-Bench images have been stuck for 5+ hours on the "Build and push SWE-Bench images" step, blocking evaluation jobs from running.
Affected Builds
All three builds are for SDK commit 217b218272499aa5b17335d82627e090b67cf9aa (Update orjson to 3.11.7):
| Run ID | Status | Elapsed (reported) | Actual Time Stuck |
|---|---|---|---|
| 22627544120 | In Progress | 13s | 5+ hours |
| 22627606528 | In Progress | 12s | 5+ hours |
| 22627616885 | In Progress | 12s | 5+ hours |
Impact
Evaluation pods blocked: Multiple evaluation pods have been waiting 5+ hours for the build to complete:
eval-22627492274-gemini-3-1-p9tlk- swebench (5h22m waiting)eval-22627517186-glm-5-k5xn4- swebench (5h21m waiting)eval-22627517186-qwen3-5-fl-n9vvw- swebench (5h21m waiting)
Logs show continuous polling:
[2026-03-03 19:52:47 UTC] Benchmarks build run 22627544120: status=in_progress, conclusion=None
Expected Behavior
Normal SWE-Bench image builds complete in 5-40 minutes:
- Recent successful builds: 5m40s, 8m8s, 10m28s, 24m10s, 40m38s
- Longest recent build: 1h9m20s
Build Details
Workflow: build-swebench-images.yml
Stuck step:
- name: Build and push SWE-Bench images
run: |
uv run benchmarks/swebench/build_images.py \
--dataset '${DATASET}' \
--split '${SPLIT}' \
--image ghcr.io/openhands/eval-agent-server \
--push \
--max-workers '${MAX_WORKERS}' \
--max-retries '${MAX_RETRIES}'Job status shows:
* Build and push SWE-Bench images (still running after 5 hours)
* Archive build logs (queued)
* Upload build logs (queued)
* Display build summary (queued)
Possible Causes
- GitHub Actions runner hung/deadlocked - all 3 builds stuck suggests runner/infrastructure issues
- Docker BuildKit deadlock - parallel image building with
--max-workersmay have deadlocked - Network/registry timeout - pushing to
ghcr.iomay be timing out silently - Resource exhaustion - runner out of disk space or memory during build
- Silent failure - build process crashed but runner didn't detect failure
Investigation Needed
- Check GitHub Actions runner logs (requires admin access)
- Verify Docker BuildKit is not deadlocked
- Check ghcr.io push operations for timeouts
- Review disk space and memory usage on runners
- Check if SDK commit 217b2182 introduced any dependency issues
Recommended Actions
-
Immediate: Cancel stuck builds:
gh run cancel 22627544120 gh run cancel 22627606528 gh run cancel 22627616885
-
Short-term: Re-trigger build for SDK commit 217b2182 and monitor closely
-
Long-term:
- Add timeout to build step (e.g.,
timeout: 60minutes) - Add progress logging to
build_images.py - Add health checks during long-running builds
- Consider build step telemetry/monitoring
- Add timeout to build step (e.g.,
Environment
- SDK Commit: 217b2182 (Update orjson to 3.11.7 to address CVE-2025-67221)
- Workflow: build-swebench-images.yml
- Runner: GitHub-hosted (specific runner unknown due to stuck status)
- First stuck: ~2026-03-03 14:30 UTC
- Affected evaluations: swebench benchmark
Additional Context
This is blocking evaluation issue #287 investigation where we're trying to determine if recent SDK changes caused OOM issues. The stuck builds prevent us from running controlled tests with the latest SDK commit.