feat(rollout):support least request priority for choosing rollout instances in sing…#955
feat(rollout):support least request priority for choosing rollout instances in sing…#955PrometheusComing wants to merge 1 commit intoinclusionAI:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the task scheduling capabilities within the rollout controller by introducing a 'least request priority' algorithm. This new policy is designed to improve the distribution of tasks across multiple instances, particularly in environments where a few long-running tasks can disproportionately affect overall completion times. By dynamically selecting the worker with the fewest active requests, the system can achieve better load balancing and more consistent performance, preventing bottlenecks that arise from uneven task distribution. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a least_request scheduling policy to better handle long-tail tasks in multi-instance rollout scenarios, which is a great improvement for load balancing. The implementation introduces a new schedule_policy.py file with different scheduling strategies. My review focuses on the correctness and performance of this new scheduling logic. I've identified a critical bug in the RoundRobinSchedulePolicy that would cause a runtime error, along with some performance inefficiencies in the LeastRequestPrioritySchedulePolicy that could be improved. Additionally, there's some dead code that could be cleaned up.
I am having trouble creating individual review comments. Click here to see my feedback.
areal/infra/schedule_policy.py (123)
The choose_worker method is a coroutine and is awaited by its caller. When no_block is true, it calls this function, _choose_worker_no_block, which is synchronous and returns a worker object directly, not an awaitable. This will cause a TypeError at runtime. To fix this, this function should be a coroutine.
async def _choose_worker_no_block(self):
areal/infra/controller/rollout_controller.py (588-590)
With the introduction of the new scheduling policies, the call to _choose_worker() has been removed from _create_submit_callback and agenerate. This makes the _choose_worker() method and its corresponding instance variable self._current_worker_idx dead code. To improve code clarity and maintainability, it's recommended to remove them. This would also require removing the associated test test_choose_worker_round_robin in areal/tests/test_rollout_controller.py.
areal/infra/schedule_policy.py (56-61)
Using heapq.heapify() after modifying the heap's root element is inefficient, as it rebuilds the entire heap with O(n) complexity. A more performant approach with O(log n) complexity is to use heapq.heappop() followed by heapq.heappush() to maintain the heap property after updating the request count.
if self.current_process_requests_state and self.current_process_requests_state[0][0] < self.max_concurrent_per_worker:
item = heapq.heappop(self.current_process_requests_state)
chosen_worker = item[1][1]
logger.info(f"{asyncio.current_task().get_name()} chooses worker: {chosen_worker.id}")
item[0] += 1
heapq.heappush(self.current_process_requests_state, item)
areal/infra/schedule_policy.py (71-78)
This method has a performance issue and a minor bug in the error message.
- Performance: It performs a linear scan (O(n)) to find the worker to release. For better performance, consider using a dictionary to map worker IDs to their state for O(1) lookup. This would require a change in
__init__to create the map. - Error Message: The
RuntimeErroron line 78 uses the loop variableworker_id, which is confusing as it will refer to the last worker in the iteration if the target worker is not found. The error message should clearly state which worker ID was not found.
Here is a suggested fix for the error message part:
for i, (process_requests_count, (worker_id, _)) in enumerate(self.current_process_requests_state):
if worker.id == worker_id:
self.current_process_requests_state[i][0] = max(0, process_requests_count - 1)
heapq.heapify(self.current_process_requests_state)
self._idle_event.set()
logger.info(f"{asyncio.current_task().get_name()} has released worker {worker.id=} ...")
return
raise RuntimeError(f"Worker with id {worker.id} not found to release.")
931b608 to
54d0444
Compare
1c1461a to
89aabcf
Compare
89aabcf to
24331eb
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new "least request priority" scheduling policy for rollout instances, which is a valuable addition for improving resource utilization, especially with long-tail tasks. The implementation is well-structured, using a policy pattern to encapsulate scheduling logic. The changes are consistently applied across different parts of the codebase, including configuration, controller logic, and tests. My review includes a few suggestions to improve the efficiency of the heap-based scheduler and fix a minor bug in error reporting. Overall, this is a solid feature enhancement.
24331eb to
5c5c94b
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a 'least request priority' scheduling policy as an alternative to round-robin, aiming to improve performance in multi-instance rollout scenarios by better handling long-tail tasks. The implementation is well-structured, using a strategy pattern for scheduling policies and an async context manager for worker acquisition and release. My review focuses on potential improvements in the new scheduling policy implementation and test robustness. I've identified a potential memory leak, an area for performance optimization in the heap management, and a potentially flaky test case.
95691d6 to
376b837
Compare
|
@rchardx Please review this PR |
0c08f3f to
0f1ab70
Compare
0f1ab70 to
1927dec
Compare
Description
For multi instance rollout scenarios, a single process is easily affected by a small number of long tail tasks, which can block the remaining tasks in its queue and result in an overall longer completion time.Different from the default round_robin algorithm, use a more efficient least request priority algorithm to select instances for executing tasks.
case1:
In the on-policy scenario, it is common to have rollout and actor share the same GPU. During a batch of inference, the default round-robin method of selecting rollout instances to handle rollout requests can easily result in some instances completing their tasks early and entering an idle state, while other instances are still busy handling long-tail requests. By using the least request priority approach, combined with the concurrency constraints of the rollout instances themselves, the selection of instances can be better optimized to address this issue.
case2:
In the off-policy scenario, since AReaL itself allocates more capacity to inference during each training round and immediately replenishes the remaining capacity after training is completed, this issue is somewhat alleviated, though not entirely eliminated. For example, in extreme cases, some instances may complete their assigned requests during the current training phase and have to wait until the end of the training round to receive new requests, which can still cause this problem.
case3:
During the evaluation phase, the request for the instance is interrupted, and the requests of the evaluation scenario is performed again. This scenario is similar to the synchronous scenario, that is, case 1.
The least request priority approach, based on the concurrency constraints of the rollout instances (for VLLM, it is max_num_seqs; for SGLang, it is max_running_requests), can achieve at least the same performance as the round-robin algorithm, and often performs even better.
Since in non-single controller scenarios, the target instances for each requester are not limited and are difficult to control, the worker selection currently only supports the single controller scenario.
Related Issue
Fixes #(issue)
Type of Change
work as expected)
Checklist
jb build docs/gemini review)Breaking Change Details (if applicable):
Additional Context
Need help? Check the Contributing Guide or ask in
GitHub Discussions!