Skip to content

BREAKING: Rename --max-attempts to --n-critic-runs#325

Draft
juanmichelini wants to merge 4 commits intomainfrom
rename-max-attempts-to-n-critic-runs
Draft

BREAKING: Rename --max-attempts to --n-critic-runs#325
juanmichelini wants to merge 4 commits intomainfrom
rename-max-attempts-to-n-critic-runs

Conversation

@juanmichelini
Copy link
Collaborator

Summary

This PR renames the --max-attempts parameter to --n-critic-runs across the benchmarks codebase to better reflect its purpose: controlling the number of critic evaluation runs in iterative mode.

Changes

  • CLI argument: --max-attempts--n-critic-runs
  • Model field: max_attemptsn_critic_runs (EvalMetadata)
  • Updated files: 17 files total
    • 7 run_infer.py files (all benchmarks)
    • Core logic files (evaluation.py, iterative.py)
    • 4 test files
    • 2 README documentation files

Breaking Changes

This is a breaking change for users. Existing scripts and workflows must be updated.

Migration Required

  • CLI usage: --max-attempts 3--n-critic-runs 3
  • Python API: EvalMetadata(max_attempts=3)EvalMetadata(n_critic_runs=3)

Related PRs

This PR is part of a coordinated change. A corresponding PR will be created for the evaluation repo.

Testing

  • All test files updated
  • Changes verified across all benchmarks

This breaking change renames the parameter across the codebase to better
reflect its purpose: controlling the number of critic evaluation runs in
iterative mode, not general retry attempts.

Changes:
- CLI argument: --max-attempts → --n-critic-runs
- Model field: max_attempts → n_critic_runs (EvalMetadata)
- Updated all run_infer.py files (7 benchmarks)
- Updated core logic (evaluation.py, iterative.py)
- Updated test files (4 files)
- Updated documentation (2 README files)

Migration guide:
- Update CLI usage: --max-attempts 3 → --n-critic-runs 3
- Update EvalMetadata construction: max_attempts= → n_critic_runs=

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #325 at branch `rename-max-attempts-to-n-critic-runs`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@juanmichelini
Copy link
Collaborator Author

1 similar comment
@juanmichelini
Copy link
Collaborator Author

@juanmichelini
Copy link
Collaborator Author

@juanmichelini
Copy link
Collaborator Author

@OpenHands bring all changes from main to this branch. resolve any conflicts like this:
main gets priority, then we make sure that we are respecting the "Rename --max-attempts to --n-critic-runs" even if we have to re do it.

@openhands-ai
Copy link

openhands-ai bot commented Mar 14, 2026

I'm on it! juanmichelini can track my progress at all-hands.dev

Merged latest main into branch. Re-applied the rename for:
- New benchmarks from main (swebenchmultilingual, swefficiency)
- Tests that were reverted by auto-merge (test_iterative_resume, test_workspace_cleanup)
- commit0/config.py default key
- swtbench README documentation
- evaluation.py log message
- models.py formatting fix

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Mar 14, 2026

All changes have already been pushed to the remote branch rename-max-attempts-to-n-critic-runs. Here's the summary:

Summary

I merged all changes from main into the PR branch and re-applied the max_attemptsn_critic_runs rename, following the conflict resolution strategy: main gets priority, then re-apply the rename.

What was done

  1. Merged origin/main (31 new commits) — no conflicts, auto-merged cleanly.
  2. Re-applied the rename in 8 files that either came from main or were reverted by the merge:
File Change
benchmarks/swebenchmultilingual/run_infer.py New benchmark from main — renamed args.max_attemptsargs.n_critic_runs, max_attempts=n_critic_runs=
benchmarks/swefficiency/run_infer.py New benchmark from main — same rename
tests/test_iterative_resume.py max_attempts=3n_critic_runs=3 (2 occurrences reverted by merge)
tests/test_workspace_cleanup.py max_attempts=1n_critic_runs=1 (3 occurrences reverted by merge)
benchmarks/commit0/config.py Config dict key "max_attempts""n_critic_runs" + comment update
benchmarks/swtbench/README.md --max-attempts--n-critic-runs in CLI examples
benchmarks/utils/evaluation.py Log string "max attempts" → "critic runs"
benchmarks/utils/models.py Ruff auto-formatting fix

Checklist

  • ✅ Main merged into branch (main gets priority)
  • ✅ Rename re-applied to all new/reverted files
  • ✅ No remaining max_attempts references (except unrelated retry logic in modal_patches.py)
  • ✅ All pre-commit checks pass (ruff format, ruff lint, pycodestyle, pyright)
  • ✅ All 157 tests pass
  • ✅ Changes are concise — only the necessary renames, no extraneous modifications
  • ✅ Pushed to origin/rename-max-attempts-to-n-critic-runs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants