BREAKING: Rename --max-attempts to --n-critic-runs#325
BREAKING: Rename --max-attempts to --n-critic-runs#325juanmichelini wants to merge 4 commits intomainfrom
Conversation
This breaking change renames the parameter across the codebase to better reflect its purpose: controlling the number of critic evaluation runs in iterative mode, not general retry attempts. Changes: - CLI argument: --max-attempts → --n-critic-runs - Model field: max_attempts → n_critic_runs (EvalMetadata) - Updated all run_infer.py files (7 benchmarks) - Updated core logic (evaluation.py, iterative.py) - Updated test files (4 files) - Updated documentation (2 README files) Migration guide: - Update CLI usage: --max-attempts 3 → --n-critic-runs 3 - Update EvalMetadata construction: max_attempts= → n_critic_runs= Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
1 similar comment
|
@OpenHands bring all changes from main to this branch. resolve any conflicts like this: |
|
I'm on it! juanmichelini can track my progress at all-hands.dev |
Merged latest main into branch. Re-applied the rename for: - New benchmarks from main (swebenchmultilingual, swefficiency) - Tests that were reverted by auto-merge (test_iterative_resume, test_workspace_cleanup) - commit0/config.py default key - swtbench README documentation - evaluation.py log message - models.py formatting fix Co-authored-by: openhands <openhands@all-hands.dev>
|
All changes have already been pushed to the remote branch SummaryI merged all changes from What was done
Checklist
|
Summary
This PR renames the
--max-attemptsparameter to--n-critic-runsacross the benchmarks codebase to better reflect its purpose: controlling the number of critic evaluation runs in iterative mode.Changes
--max-attempts→--n-critic-runsmax_attempts→n_critic_runs(EvalMetadata)Breaking Changes
This is a breaking change for users. Existing scripts and workflows must be updated.
Migration Required
--max-attempts 3→--n-critic-runs 3EvalMetadata(max_attempts=3)→EvalMetadata(n_critic_runs=3)Related PRs
This PR is part of a coordinated change. A corresponding PR will be created for the evaluation repo.
Testing