Skip to content

feat(rollout):support least request priority for choosing rollout instances in sing…#955

Open
PrometheusComing wants to merge 1 commit intoinclusionAI:mainfrom
PrometheusComing:xxj_balance
Open

feat(rollout):support least request priority for choosing rollout instances in sing…#955
PrometheusComing wants to merge 1 commit intoinclusionAI:mainfrom
PrometheusComing:xxj_balance

Conversation

@PrometheusComing
Copy link

@PrometheusComing PrometheusComing commented Mar 2, 2026

Description

For multi instance rollout scenarios, a single process is easily affected by a small number of long tail tasks, which can block the remaining tasks in its queue and result in an overall longer completion time.Different from the default round_robin algorithm, use a more efficient least request priority algorithm to select instances for executing tasks.

case1:
In the on-policy scenario, it is common to have rollout and actor share the same GPU. During a batch of inference, the default round-robin method of selecting rollout instances to handle rollout requests can easily result in some instances completing their tasks early and entering an idle state, while other instances are still busy handling long-tail requests. By using the least request priority approach, combined with the concurrency constraints of the rollout instances themselves, the selection of instances can be better optimized to address this issue.

case2:
In the off-policy scenario, since AReaL itself allocates more capacity to inference during each training round and immediately replenishes the remaining capacity after training is completed, this issue is somewhat alleviated, though not entirely eliminated. For example, in extreme cases, some instances may complete their assigned requests during the current training phase and have to wait until the end of the training round to receive new requests, which can still cause this problem.

case3:
During the evaluation phase, the request for the instance is interrupted, and the requests of the evaluation scenario is performed again. This scenario is similar to the synchronous scenario, that is, case 1.

The least request priority approach, based on the concurrency constraints of the rollout instances (for VLLM, it is max_num_seqs; for SGLang, it is max_running_requests), can achieve at least the same performance as the round-robin algorithm, and often performs even better.

Since in non-single controller scenarios, the target instances for each requester are not limited and are difficult to control, the worker selection currently only supports the single controller scenario.

Related Issue

Fixes #(issue)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the task scheduling capabilities within the rollout controller by introducing a 'least request priority' algorithm. This new policy is designed to improve the distribution of tasks across multiple instances, particularly in environments where a few long-running tasks can disproportionately affect overall completion times. By dynamically selecting the worker with the fewest active requests, the system can achieve better load balancing and more consistent performance, preventing bottlenecks that arise from uneven task distribution.

Highlights

  • New Scheduling Policy: Introduced a 'least_request' scheduling policy to efficiently distribute tasks among rollout instances, aiming to mitigate issues caused by long-tail tasks in multi-instance scenarios.
  • Refactored Worker Selection: The RolloutController's worker selection logic has been refactored to use a new SchedulePolicy abstraction, allowing for dynamic policy switching between 'round_robin' and 'least_request'.
  • Configuration Update: The InferenceEngineConfig now includes 'least_request' as a valid choice for the schedule_policy CLI argument.
  • New Scheduling Policy Module: A new module areal/infra/schedule_policy.py was added to define the abstract SchedulePolicy class and its concrete implementations: LeastRequestPrioritySchedulePolicy and RoundRobinSchedulePolicy.
  • Enhanced Test Coverage: New unit tests were added for the LeastRequestPrioritySchedulePolicy and RoundRobinSchedulePolicy, along with an integration test in RolloutController to verify the 'least_request' behavior, including a simulated long-tail task.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • areal/api/cli_args.py
    • Added 'least_request' as an option for the schedule_policy in InferenceEngineConfig metadata.
  • areal/infra/controller/rollout_controller.py
    • Imported LeastRequestPrioritySchedulePolicy and RoundRobinSchedulePolicy.
    • Initialized scheduler_policy_context and worker_rank_dict in the constructor.
    • Configured the appropriate scheduling policy (LeastRequestPrioritySchedulePolicy or RoundRobinSchedulePolicy) based on config.schedule_policy during initialization.
    • Modified _create_submit_callback to use the scheduler_policy_context for worker selection.
    • Modified agenerate method to use the scheduler_policy_context for worker selection.
  • areal/infra/remote_inf_engine.py
    • Imported is_single_controller from areal.utils.environ.
    • Updated choose_server to use round-robin scheduling if is_single_controller() is true, even if the explicit policy is not 'round_robin'.
  • areal/infra/schedule_policy.py
    • Added a new file defining SchedulePolicy (an abstract base class), LeastRequestPrioritySchedulePolicy, and RoundRobinSchedulePolicy.
  • areal/tests/test_rollout_controller.py
    • Added a simulated long-tail task to the agenerate method of MockScheduler.
    • Decorated TestRolloutControllerAgenerate with @pytest.mark.asyncio.
    • Added test_agenerate_least_request to verify the behavior of the least request scheduling policy.
  • areal/tests/test_schedule_policy.py
    • Added a new file containing unit tests for LeastRequestPrioritySchedulePolicy and RoundRobinSchedulePolicy.
Activity
  • No specific activity (comments, reviews, etc.) was provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@PrometheusComing PrometheusComing changed the title support least request priority for choosing rollout instances in sing… [WIP]support least request priority for choosing rollout instances in sing… Mar 2, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a least_request scheduling policy to better handle long-tail tasks in multi-instance rollout scenarios, which is a great improvement for load balancing. The implementation introduces a new schedule_policy.py file with different scheduling strategies. My review focuses on the correctness and performance of this new scheduling logic. I've identified a critical bug in the RoundRobinSchedulePolicy that would cause a runtime error, along with some performance inefficiencies in the LeastRequestPrioritySchedulePolicy that could be improved. Additionally, there's some dead code that could be cleaned up.

I am having trouble creating individual review comments. Click here to see my feedback.

areal/infra/schedule_policy.py (123)

critical

The choose_worker method is a coroutine and is awaited by its caller. When no_block is true, it calls this function, _choose_worker_no_block, which is synchronous and returns a worker object directly, not an awaitable. This will cause a TypeError at runtime. To fix this, this function should be a coroutine.

    async def _choose_worker_no_block(self):

areal/infra/controller/rollout_controller.py (588-590)

medium

With the introduction of the new scheduling policies, the call to _choose_worker() has been removed from _create_submit_callback and agenerate. This makes the _choose_worker() method and its corresponding instance variable self._current_worker_idx dead code. To improve code clarity and maintainability, it's recommended to remove them. This would also require removing the associated test test_choose_worker_round_robin in areal/tests/test_rollout_controller.py.

areal/infra/schedule_policy.py (56-61)

medium

Using heapq.heapify() after modifying the heap's root element is inefficient, as it rebuilds the entire heap with O(n) complexity. A more performant approach with O(log n) complexity is to use heapq.heappop() followed by heapq.heappush() to maintain the heap property after updating the request count.

                if self.current_process_requests_state and self.current_process_requests_state[0][0] < self.max_concurrent_per_worker:
                    item = heapq.heappop(self.current_process_requests_state)
                    chosen_worker = item[1][1]
                    logger.info(f"{asyncio.current_task().get_name()} chooses worker: {chosen_worker.id}")
                    item[0] += 1
                    heapq.heappush(self.current_process_requests_state, item)

areal/infra/schedule_policy.py (71-78)

medium

This method has a performance issue and a minor bug in the error message.

  1. Performance: It performs a linear scan (O(n)) to find the worker to release. For better performance, consider using a dictionary to map worker IDs to their state for O(1) lookup. This would require a change in __init__ to create the map.
  2. Error Message: The RuntimeError on line 78 uses the loop variable worker_id, which is confusing as it will refer to the last worker in the iteration if the target worker is not found. The error message should clearly state which worker ID was not found.

Here is a suggested fix for the error message part:

            for i, (process_requests_count, (worker_id, _)) in enumerate(self.current_process_requests_state):
                if worker.id == worker_id:
                    self.current_process_requests_state[i][0] = max(0, process_requests_count - 1)
                    heapq.heapify(self.current_process_requests_state)
                    self._idle_event.set()
                    logger.info(f"{asyncio.current_task().get_name()} has released worker {worker.id=} ...")
                    return
            raise RuntimeError(f"Worker with id {worker.id} not found to release.")

@PrometheusComing PrometheusComing force-pushed the xxj_balance branch 4 times, most recently from 931b608 to 54d0444 Compare March 9, 2026 01:42
@PrometheusComing PrometheusComing force-pushed the xxj_balance branch 3 times, most recently from 1c1461a to 89aabcf Compare March 13, 2026 01:48
@PrometheusComing PrometheusComing changed the title [WIP]support least request priority for choosing rollout instances in sing… [feat]support least request priority for choosing rollout instances in sing… Mar 13, 2026
@PrometheusComing
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new "least request priority" scheduling policy for rollout instances, which is a valuable addition for improving resource utilization, especially with long-tail tasks. The implementation is well-structured, using a policy pattern to encapsulate scheduling logic. The changes are consistently applied across different parts of the codebase, including configuration, controller logic, and tests. My review includes a few suggestions to improve the efficiency of the heap-based scheduler and fix a minor bug in error reporting. Overall, this is a solid feature enhancement.

@PrometheusComing
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a 'least request priority' scheduling policy as an alternative to round-robin, aiming to improve performance in multi-instance rollout scenarios by better handling long-tail tasks. The implementation is well-structured, using a strategy pattern for scheduling policies and an async context manager for worker acquisition and release. My review focuses on potential improvements in the new scheduling policy implementation and test robustness. I've identified a potential memory leak, an area for performance optimization in the heap management, and a potentially flaky test case.

@PrometheusComing PrometheusComing force-pushed the xxj_balance branch 3 times, most recently from 95691d6 to 376b837 Compare March 13, 2026 06:34
@PrometheusComing
Copy link
Author

@rchardx Please review this PR

@PrometheusComing PrometheusComing changed the title [feat]support least request priority for choosing rollout instances in sing… feat(rollout):support least request priority for choosing rollout instances in sing… Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant