fix: cache SDK sdist across image builds#507
Conversation
Build the SDK sdist once per batch and reuse it through the SDK's native build() path by intercepting only the internal uv sdist subprocess call. This removes the per-image sdist overhead while preserving tag, cache, and build-arg behavior for current and future SDK versions. Also switch build-related modules to stdlib logging so worker forks do not inherit SDK Rich console state. Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Solves a real performance problem, but the solution is more complex than it needs to be.
[CRITICAL ISSUES]
The core optimization is sound (build sdist once instead of N times), but the implementation adds unnecessary indirection that will make this harder to maintain.
[IMPROVEMENT OPPORTUNITIES]
See inline comments for specific architectural improvements that would simplify the data flow and reduce coupling to SDK internals.
VERDICT: ✅ Worth merging - Performance improvement is real and valuable. Fallback behavior prevents breaking changes. The complexity is manageable but could be simplified in follow-up work.
KEY INSIGHT: Monkey-patching SDK internals works but creates maintenance coupling. Consider upstreaming sdist caching to the SDK itself, or pass the pre-built sdist path explicitly through the call chain instead of via environment variable + runtime patching.
|
Addressed the review feedback in Changes made:
I kept the patch-at-the-SDK-build-seam design because it still avoids duplicating the SDK build path in benchmarks, while making the data flow more explicit and defensive. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Solves a real performance problem with pragmatic engineering. The patching approach is more complex than ideal, but given the constraint of preserving SDK build logic, it's a reasonable trade-off.
|
@OpenHands /codereview-roasted |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
🔥 Linus-Style Code Review: PR #507 vs PR #505Taste Rating: 🟢 Good taste (PR #507)PR #507 demonstrates "good taste" - it makes the minimal change to solve the problem while keeping the SDK in control of its own build behavior. Let me explain why this matters. The Core DifferencePR #507 (this PR): Patches the SDK's internal PR #505: Completely replaces the SDK's Why PR #507 is the Right Design1. "Don't Duplicate; Delegate"PR #507: with _patch_sdk_sdist_build(sdk_build_module, cached_sdist):
tags = build(opts) # SDK still owns all build logicPR #505: # Replicates 100 lines of SDK internals:
args = ["docker", "buildx", "build", "--file", ...]
args += ["--build-arg", f"BASE_IMAGE={opts.base_image}"]
args += ["--build-arg", f"OPENHANDS_BUILD_GIT_SHA={opts.git_sha}"]
# ... and so onWhen the SDK adds a new build arg (say, 2. Future-Proofing via Graceful FallbackPR #507 fails gracefully: original_run = getattr(sdk_build_module, "_run", None)
if original_run is None:
logger.warning("SDK build module has no _run helper; falling back to native build()")
yield
returnIf the SDK refactors away 3. Explicit Parameter Passing > Hidden Environment VariablesPR #507 passes def build_image(..., cached_sdist: Path | None = None):PR #505 uses a hidden env var: os.environ["OPENHANDS_CACHED_SDIST"] = str(cached_sdist_path)
# ... and later reads it implicitly in build_image()
cached_sdist = os.environ.get("OPENHANDS_CACHED_SDIST")Explicit is better than implicit. The function signature documents the interface; environment variables create hidden coupling. 4. TestabilityPR #507 has focused tests:
PR #505 has no tests for Strong Features from PR #505 Worth Porting✅ Explanatory Comments for the Logging SwitchPR #505 has inline comments explaining why the stdlib logging switch matters: # Use stdlib logging instead of openhands.sdk.get_logger to avoid initializing
# Rich console state before ProcessPoolExecutor forks (causes deadlocks).
import loggingPR #507 just switches the import without explaining the deadlock context. This comment should be added for future maintainers. Recommendation: Add a brief comment to ❌ Environment Variable Approach (Don't Port)The ❌ Manual buildx Command Construction (Definitely Don't Port)This is exactly what creates the maintenance burden. Keep the SDK in control. Minor Nits (Skip These - Linters Exist)I don't care about import ordering or spacing. The code passes VERDICT: ✅ Worth merging (PR #507)The core logic is sound. PR #507 makes the minimal, surgical change to eliminate the repeated KEY INSIGHT: The difference between "good taste" and "bad taste" here is whether you trust the SDK to own its build behavior. PR #507 says "yes, just let me cache the expensive bit." PR #505 says "no, I'll reimplement your build logic myself." The first approach scales; the second creates a fork. Suggested Follow-up (Non-Blocking)Add this comment at the top of # Use stdlib logging instead of openhands.sdk.get_logger to avoid initializing
# Rich console state before ProcessPoolExecutor forks (causes deadlocks).This documents the "why" for future maintainers. |
Ported from PR #505 - explains why openhands.sdk.get_logger is not used in build-related modules to avoid Rich console state deadlocks when ProcessPoolExecutor forks worker processes. Co-authored-by: openhands <openhands@all-hands.dev>
|
Ported the explanatory comments from PR #505 in All six affected files now document why stdlib |
SummaryI completed the code review comparing PR #507 (this PR) with PR #505, and ported the one strong feature from #505. Code Review Posted
Changes Made (Already Pushed)Commit # Use stdlib logging instead of openhands.sdk.get_logger to avoid initializing
# Rich console state before ProcessPoolExecutor forks (causes deadlocks).Files updated:
Checklist
|
|
Triggered fresh full image-build validation runs from the current feature branch tip Monitor here:
Run details:
These were dispatched after the latest review fixes. They are currently pending behind the older branch-scoped runs while GitHub finishes cancelling those earlier jobs. |
|
Current validation runs for the SDK-native prebuilt-sdist API change:
Current refs:
Older branch-scoped workflow runs were cancelled so these are the active runs to track. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Pragmatic fix for a real performance problem. The solution is clean, preserves SDK build logic, and handles failures gracefully.
[KEY INSIGHT]
This solves the actual problem (SDK rebuilding the same sdist N times) with straightforward engineering: build once, pass the path through, clean up after. Previous review concerns about env var indirection have been addressed by making the data flow explicit.
[MINOR OBSERVATIONS]
See inline comments. The workflow hardening changes feel like scope creep but are acceptable since they all relate to build robustness. The 10-parameter function is extracting complexity to reduce nesting, which is the right trade-off.
VERDICT: ✅ Worth merging - Core logic is sound, no blocking issues.
|
|
||
| runs-on: | ||
| labels: blacksmith-32vcpu-ubuntu-2204 | ||
| timeout-minutes: 180 |
There was a problem hiding this comment.
🟡 Suggestion - Scope: Workflow improvements (timeout, build timing, disk reporting) are good defensive engineering, but feel like scope creep for a PR focused on caching the SDK sdist.
Acceptable because they all relate to making builds more reliable. Just be aware that bundling unrelated improvements makes bisecting regressions harder.
If splitting: the workflow hardening could be a separate PR titled "ci: add build timeouts and disk monitoring".
| raise RuntimeError("Unreachable code reached in _build_with_logging") | ||
|
|
||
|
|
||
| def _iter_batch_results( |
There was a problem hiding this comment.
🟢 Acceptable - Pragmatism: Ten parameters is a lot, but this function is extracting the ProcessPoolExecutor loop from build_many() to reduce nesting depth.
The alternative would be bundling these into a config object, but that adds indirection without reducing actual complexity. Since all parameters are needed for the build process, this is a reasonable trade-off.
The nesting reduction in build_many() is more valuable than the parameter count smell.
| except Exception as e: | ||
| logger.warning( | ||
| "Failed to pre-build SDK sdist; each image will build its own: %s", e | ||
| ) |
There was a problem hiding this comment.
🟢 Good taste: Clean fallback behavior. If prebuild fails, log a warning and continue without caching—each image builds its own sdist.
This is robust: don't fail the entire batch because of an optimization failure. The warning ensures visibility without breaking the build.
| # Use stdlib logging instead of openhands.sdk.get_logger to avoid initializing | ||
| # Rich console state before ProcessPoolExecutor forks (causes deadlocks). | ||
| import logging |
There was a problem hiding this comment.
🟢 Acceptable - Pragmatism: Switching to stdlib logging to avoid Rich console state issues in multiprocessing is the right fix.
Rich's console state doesn't fork well across ProcessPoolExecutor workers—this was causing deadlocks. The comment clearly explains the constraint for future maintainers.
|
Triggered fresh full validation runs on
Both runs are on benchmarks |
Update the software-agent-sdk submodule from fc962c4 to 447aa91 so the issue-504 SWT branch uses the SDK build path that keeps shared cache reads but disables shared cache writes by default. This preserves the prebuilt-sdist work while removing the parallel registry export contention identified in issue #510. Co-authored-by: openhands <openhands@all-hands.dev>
|
Reposting this here so it doesn't get lost in the slack. https://github.com/OpenHands/benchmarks/actions/runs/23043936501/job/66928367859 succeeded because it relied on build |
Summary
uv build --sdistsubprocess call so the SDK still owns tag, cache, and Docker build behavior8e8223b24dfd041875fe845b13326b0a1238b793OPENHANDS_BUILDKIT_CACHE_MODE=offfor the SWT image-build workflowtimeout-minutes: 180limit from both SWE-Bench and SWT-Bench image-build workflowsWhy this is the real fix
The first regression in this area was repeated SDK sdist construction across hundreds of image builds. This PR removes that repeated work in benchmarks without forking SDK behavior.
While validating that change on full SWT-Bench runs, the dominant remaining slowdown turned out to be registry cache export during cold fan-out image builds. The SDK PR adds cache export controls; this PR wires those controls into the SWT workflow and removes the workflow timeout that was cancelling otherwise healthy long-running image builds.
Because benchmarks still calls the SDK's native
build()path, the repo keeps using SDK-owned tagging, Docker args, and cache semantics instead of duplicating them here.Validation
make builduv run pytest tests/test_image_utils.py tests/test_llm_config.pyuv run pre-commit run --files benchmarks/utils/build_utils.py benchmarks/utils/buildx_utils.py benchmarks/utils/image_utils.py benchmarks/swebench/build_images.py benchmarks/swtbench/build_eval_env_images.py benchmarks/swtbench/image_utils.py tests/test_image_utils.py8e8223b24dfd041875fe845b13326b0a1238b793:23043936501433/433images built,❌ 0, total image-build time4:10:33Fixes #504