feat(rollout):support least request priority for choosing rollout instances in sing… by PrometheusComing · Pull Request #955 · inclusionAI/AReaL

PrometheusComing · 2026-03-02T08:54:57Z

Description

For multi instance rollout scenarios, a single process is easily affected by a small number of long tail tasks, which can block the remaining tasks in its queue and result in an overall longer completion time.Different from the default round_robin algorithm, use a more efficient least request priority algorithm to select instances for executing tasks.

case1:
In the on-policy scenario, it is common to have rollout and actor share the same GPU. During a batch of inference, the default round-robin method of selecting rollout instances to handle rollout requests can easily result in some instances completing their tasks early and entering an idle state, while other instances are still busy handling long-tail requests. By using the least request priority approach, combined with the concurrency constraints of the rollout instances themselves, the selection of instances can be better optimized to address this issue.

case2：
In the off-policy scenario, since AReaL itself allocates more capacity to inference during each training round and immediately replenishes the remaining capacity after training is completed, this issue is somewhat alleviated, though not entirely eliminated. For example, in extreme cases, some instances may complete their assigned requests during the current training phase and have to wait until the end of the training round to receive new requests, which can still cause this problem.

case3:
During the evaluation phase, the request for the instance is interrupted, and the requests of the evaluation scenario is performed again. This scenario is similar to the synchronous scenario, that is, case 1.

The least request priority approach, based on the concurrency constraints of the rollout instances (for VLLM, it is max_num_seqs; for SGLang, it is max_running_requests), can achieve at least the same performance as the round-robin algorithm, and often performs even better.

Since in non-single controller scenarios, the target instances for each requester are not limited and are difficult to control, the worker selection currently only supports the single controller scenario.

Related Issue

Fixes #(issue)

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

gemini-code-assist · 2026-03-02T08:55:17Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the task scheduling capabilities within the rollout controller by introducing a 'least request priority' algorithm. This new policy is designed to improve the distribution of tasks across multiple instances, particularly in environments where a few long-running tasks can disproportionately affect overall completion times. By dynamically selecting the worker with the fewest active requests, the system can achieve better load balancing and more consistent performance, preventing bottlenecks that arise from uneven task distribution.

Highlights

New Scheduling Policy: Introduced a 'least_request' scheduling policy to efficiently distribute tasks among rollout instances, aiming to mitigate issues caused by long-tail tasks in multi-instance scenarios.
Refactored Worker Selection: The RolloutController's worker selection logic has been refactored to use a new SchedulePolicy abstraction, allowing for dynamic policy switching between 'round_robin' and 'least_request'.
Configuration Update: The InferenceEngineConfig now includes 'least_request' as a valid choice for the schedule_policy CLI argument.
New Scheduling Policy Module: A new module areal/infra/schedule_policy.py was added to define the abstract SchedulePolicy class and its concrete implementations: LeastRequestPrioritySchedulePolicy and RoundRobinSchedulePolicy.
Enhanced Test Coverage: New unit tests were added for the LeastRequestPrioritySchedulePolicy and RoundRobinSchedulePolicy, along with an integration test in RolloutController to verify the 'least_request' behavior, including a simulated long-tail task.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

areal/api/cli_args.py
- Added 'least_request' as an option for the schedule_policy in InferenceEngineConfig metadata.
areal/infra/controller/rollout_controller.py
- Imported LeastRequestPrioritySchedulePolicy and RoundRobinSchedulePolicy.
- Initialized scheduler_policy_context and worker_rank_dict in the constructor.
- Configured the appropriate scheduling policy (LeastRequestPrioritySchedulePolicy or RoundRobinSchedulePolicy) based on config.schedule_policy during initialization.
- Modified _create_submit_callback to use the scheduler_policy_context for worker selection.
- Modified agenerate method to use the scheduler_policy_context for worker selection.
areal/infra/remote_inf_engine.py
- Imported is_single_controller from areal.utils.environ.
- Updated choose_server to use round-robin scheduling if is_single_controller() is true, even if the explicit policy is not 'round_robin'.
areal/infra/schedule_policy.py
- Added a new file defining SchedulePolicy (an abstract base class), LeastRequestPrioritySchedulePolicy, and RoundRobinSchedulePolicy.
areal/tests/test_rollout_controller.py
- Added a simulated long-tail task to the agenerate method of MockScheduler.
- Decorated TestRolloutControllerAgenerate with @pytest.mark.asyncio.
- Added test_agenerate_least_request to verify the behavior of the least request scheduling policy.
areal/tests/test_schedule_policy.py
- Added a new file containing unit tests for LeastRequestPrioritySchedulePolicy and RoundRobinSchedulePolicy.

Activity

No specific activity (comments, reviews, etc.) was provided in the context for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a least_request scheduling policy to better handle long-tail tasks in multi-instance rollout scenarios, which is a great improvement for load balancing. The implementation introduces a new schedule_policy.py file with different scheduling strategies. My review focuses on the correctness and performance of this new scheduling logic. I've identified a critical bug in the RoundRobinSchedulePolicy that would cause a runtime error, along with some performance inefficiencies in the LeastRequestPrioritySchedulePolicy that could be improved. Additionally, there's some dead code that could be cleaned up.

I am having trouble creating individual review comments. Click here to see my feedback.

areal/infra/schedule_policy.py (123)

The choose_worker method is a coroutine and is awaited by its caller. When no_block is true, it calls this function, _choose_worker_no_block, which is synchronous and returns a worker object directly, not an awaitable. This will cause a TypeError at runtime. To fix this, this function should be a coroutine.

    async def _choose_worker_no_block(self):

areal/infra/controller/rollout_controller.py (588-590)

With the introduction of the new scheduling policies, the call to _choose_worker() has been removed from _create_submit_callback and agenerate. This makes the _choose_worker() method and its corresponding instance variable self._current_worker_idx dead code. To improve code clarity and maintainability, it's recommended to remove them. This would also require removing the associated test test_choose_worker_round_robin in areal/tests/test_rollout_controller.py.

areal/infra/schedule_policy.py (56-61)

Using heapq.heapify() after modifying the heap's root element is inefficient, as it rebuilds the entire heap with O(n) complexity. A more performant approach with O(log n) complexity is to use heapq.heappop() followed by heapq.heappush() to maintain the heap property after updating the request count.

                if self.current_process_requests_state and self.current_process_requests_state[0][0] < self.max_concurrent_per_worker:
                    item = heapq.heappop(self.current_process_requests_state)
                    chosen_worker = item[1][1]
                    logger.info(f"{asyncio.current_task().get_name()} chooses worker: {chosen_worker.id}")
                    item[0] += 1
                    heapq.heappush(self.current_process_requests_state, item)

areal/infra/schedule_policy.py (71-78)

This method has a performance issue and a minor bug in the error message.

Performance: It performs a linear scan (O(n)) to find the worker to release. For better performance, consider using a dictionary to map worker IDs to their state for O(1) lookup. This would require a change in __init__ to create the map.
Error Message: The RuntimeError on line 78 uses the loop variable worker_id, which is confusing as it will refer to the last worker in the iteration if the target worker is not found. The error message should clearly state which worker ID was not found.

Here is a suggested fix for the error message part:

            for i, (process_requests_count, (worker_id, _)) in enumerate(self.current_process_requests_state):
                if worker.id == worker_id:
                    self.current_process_requests_state[i][0] = max(0, process_requests_count - 1)
                    heapq.heapify(self.current_process_requests_state)
                    self._idle_event.set()
                    logger.info(f"{asyncio.current_task().get_name()} has released worker {worker.id=} ...")
                    return
            raise RuntimeError(f"Worker with id {worker.id} not found to release.")

PrometheusComing · 2026-03-13T02:19:09Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new "least request priority" scheduling policy for rollout instances, which is a valuable addition for improving resource utilization, especially with long-tail tasks. The implementation is well-structured, using a policy pattern to encapsulate scheduling logic. The changes are consistently applied across different parts of the codebase, including configuration, controller logic, and tests. My review includes a few suggestions to improve the efficiency of the heap-based scheduler and fix a minor bug in error reporting. Overall, this is a solid feature enhancement.

areal/infra/schedule_policy.py

areal/infra/controller/rollout_controller.py

areal/infra/schedule_policy.py

PrometheusComing · 2026-03-13T02:41:09Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a 'least request priority' scheduling policy as an alternative to round-robin, aiming to improve performance in multi-instance rollout scenarios by better handling long-tail tasks. The implementation is well-structured, using a strategy pattern for scheduling policies and an async context manager for worker acquisition and release. My review focuses on potential improvements in the new scheduling policy implementation and test robustness. I've identified a potential memory leak, an area for performance optimization in the heap management, and a potentially flaky test case.

areal/infra/schedule_policy.py

tests/test_rollout_controller.py

PrometheusComing · 2026-03-13T07:28:47Z

@rchardx Please review this PR

PrometheusComing changed the title ~~support least request priority for choosing rollout instances in sing…~~ [WIP]support least request priority for choosing rollout instances in sing… Mar 2, 2026

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

PrometheusComing force-pushed the xxj_balance branch 4 times, most recently from 931b608 to 54d0444 Compare March 9, 2026 01:42

PrometheusComing force-pushed the xxj_balance branch 3 times, most recently from 1c1461a to 89aabcf Compare March 13, 2026 01:48

PrometheusComing changed the title ~~[WIP]support least request priority for choosing rollout instances in sing…~~ [feat]support least request priority for choosing rollout instances in sing… Mar 13, 2026

PrometheusComing force-pushed the xxj_balance branch from 89aabcf to 24331eb Compare March 13, 2026 02:16

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

areal/infra/schedule_policy.py Outdated Show resolved Hide resolved

areal/infra/schedule_policy.py Outdated Show resolved Hide resolved

areal/infra/controller/rollout_controller.py Show resolved Hide resolved

areal/infra/schedule_policy.py Show resolved Hide resolved

PrometheusComing force-pushed the xxj_balance branch from 24331eb to 5c5c94b Compare March 13, 2026 02:39

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

areal/infra/schedule_policy.py Show resolved Hide resolved

areal/infra/schedule_policy.py Show resolved Hide resolved

tests/test_rollout_controller.py Show resolved Hide resolved

PrometheusComing force-pushed the xxj_balance branch 3 times, most recently from 95691d6 to 376b837 Compare March 13, 2026 06:34

PrometheusComing force-pushed the xxj_balance branch 3 times, most recently from 0c08f3f to 0f1ab70 Compare March 14, 2026 06:41

PrometheusComing closed this Mar 14, 2026

PrometheusComing force-pushed the xxj_balance branch from 0f1ab70 to 1927dec Compare March 14, 2026 06:57

feat(rollout): support least_request for choosingg rollouts

3bc9e83

PrometheusComing reopened this Mar 14, 2026

PrometheusComing changed the title ~~[feat]support least request priority for choosing rollout instances in sing…~~ feat(rollout):support least request priority for choosing rollout instances in sing… Mar 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rollout):support least request priority for choosing rollout instances in sing…#955

feat(rollout):support least request priority for choosing rollout instances in sing…#955
PrometheusComing wants to merge 1 commit intoinclusionAI:mainfrom
PrometheusComing:xxj_balance

PrometheusComing commented Mar 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

PrometheusComing commented Mar 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PrometheusComing commented Mar 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PrometheusComing commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PrometheusComing commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

areal/infra/schedule_policy.py (123)

areal/infra/controller/rollout_controller.py (588-590)

areal/infra/schedule_policy.py (56-61)

areal/infra/schedule_policy.py (71-78)

Uh oh!

PrometheusComing commented Mar 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PrometheusComing commented Mar 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PrometheusComing commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PrometheusComing commented Mar 2, 2026 •

edited

Loading