[Example] Clip_B and Clip_V from entropy dynamics by hiyuchang · Pull Request #509 · agentscope-ai/Trinity-RFT

hiyuchang · 2026-02-13T09:33:39Z

Description

We add support for algorithms in On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models. Contact: @shuminwang-ai.

Clip_B
Clip_V

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist · 2026-02-13T09:34:00Z

Summary of Changes

Hello @hiyuchang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the framework's capabilities for fine-tuning Large Language Models using Reinforcement Learning by incorporating advanced entropy dynamics algorithms. It specifically implements the 'Clip_B' method, which refines token-level advantage calculations and introduces a new GRPO variant to manage these computations within the trainer. This integration allows for more nuanced control over the training process, aiming to improve model performance and stability.

Highlights

New Algorithm Integration: Integrated the 'Clip_B' algorithm for entropy dynamics in Reinforcement Learning (RL) fine-tuning of Large Language Models (LLMs), based on recent research.
GRPO Variant for Trainer-side Advantage: Introduced a new GRPOverlAlgorithm that delegates advantage computation to the trainer, enabling more flexible advantage function implementations like Clip_B.
Example Configuration and Documentation: Provided a comprehensive example including a README and a YAML configuration file to demonstrate how to set up and run experiments with the new Clip_B algorithm.

Changelog

examples/entropy/README.md
- Added a new README file detailing the setup and execution of Clip_B experiments, including data preparation and a note on future Clip_V implementation.
examples/entropy/clipb.yaml
- Added a new YAML configuration file for the Clip_B example, specifying model, algorithm, buffer, explorer, and trainer settings for a math-related dataset.
examples/entropy/clipb_trainer.patch
- Added a patch file designed to modify trinity/trainer/verl_trainer.py to prevent the 'entropys' field from being removed from the batch, ensuring it's available for advantage functions like Clip_B.
trinity/algorithm/init.py
- Registered the new grpo_verl algorithm type, making it available for use within the system.
trinity/algorithm/advantage_fn/init.py
- Registered the new clipb advantage function, allowing it to be selected and used in algorithms.
trinity/algorithm/advantage_fn/clipb_advantage.py
- Added the ClipBAdvantageFn class, implementing the Clip_B advantage computation logic, which includes token filtering based on entropy signals and rewards.
trinity/algorithm/algorithm.py
- Defined the GRPOverlAlgorithm class, a GRPO variant configured to compute advantages within the trainer, supporting algorithms like Clip_B.
trinity/common/verl_config.py
- Added an optional loss_scale_factor field to the Actor configuration class.

Activity

No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the Clip_B algorithm, including a new advantage function and an example configuration. The implementation is largely correct, but there are opportunities for improvement. Specifically, the logic for token filtering in the advantage function can be simplified for better readability. There's also an inconsistency in the standard deviation calculation that should be addressed. A significant maintainability concern is the use of a patch file for modifying core trainer behavior; this change should be integrated directly into the source code. Finally, there are minor issues in the example's README and configuration file that should be fixed.

examples/entropy/clipb_trainer.patch

examples/entropy/README.md

examples/entropy/clipb.yaml

trinity/algorithm/advantage_fn/clipb_advantage.py

trinity/trainer/verl/fsdp_workers.py

hiyuchang · 2026-02-26T02:12:12Z

/gemini-review

hiyuchang · 2026-02-26T02:30:32Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for the Clip_B and Clip_V algorithms, which is a valuable addition. However, the implementation has some significant structural issues that should be addressed. The reliance on manually applied patch files and code duplication (e.g., clipv_dp_actor.py) makes the changes difficult to integrate and maintain. A better approach would be to incorporate these changes directly into the core library, using configuration flags to enable or disable the new functionality. This would avoid code duplication and make the examples self-contained. Additionally, there are several areas in the code with duplication and opportunities for simplification that I've pointed out in the specific comments.

examples/entropy/clipv_trainer.patch

trinity/trainer/verl/dp_actor.py

examples/entropy/README.md

examples/entropy/clipv_dp_actor.py

trinity/algorithm/advantage_fn/clipb_advantage.py

trinity/algorithm/advantage_fn/clipv_advantage.py

hiyuchang · 2026-02-26T05:52:28Z

/unittest-module-trainer

github-actions · 2026-02-26T06:43:33Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
27	3	21	3	0	0	46m 6s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	The test failed in the call phase due to an exception
❌ tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestOverRollout::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::ColocateModeTest::test_trainer	The test failed in the call phase due to an assertion error

Skipped

Tests	Status
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class	skipped ⏭️

Tests

Test Name	Status	Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	❌	2m 42s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	7m 19s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	❌	1m 40s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	❌	1m 17s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	❌	1m 11s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	❌	1m 16s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	❌	1m 15s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	⏭️	1ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	❌	37.4s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	❌	37.2s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	❌	34.7s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	❌	2m 5s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	❌	2m 8s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	3m 19s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	❌	1m 41s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	✅	7m 14s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	❌	56.5s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	❌	1m 19s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	❌	1m 42s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	❌	50.9s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	❌	1m 46s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer	❌	1m 3s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	❌	40.8s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	⏭️	1ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class	⏭️	1ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner	❌	1m 16s
tests/trainer/trainer_test.py::ColocateModeTest::test_trainer	❌	1m 21s

Github Test Reporter by CTRF 💚

hiyuchang · 2026-02-26T08:19:11Z

/unittest-module-trainer

pan-x-c · 2026-02-26T09:58:46Z

/unittest-module-trainer

github-actions · 2026-02-26T10:49:18Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
27	24	0	3	0	0	47m 59s

Skipped

Tests	Status
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class	skipped ⏭️

Tests

Test Name	Status	Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	✅	4m 17s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	5m 15s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	1m 43s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	1m 4s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	1m 4s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	1m 8s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	1m 12s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	⏭️	1ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	33.5s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	32.0s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	32.0s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	✅	1m 37s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	✅	1m 37s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	2m 25s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	✅	3m 4s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	✅	5m 44s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	2m 1s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	✅	1m 48s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	✅	2m 38s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	✅	1m 8s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	✅	3m 13s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer	✅	1m
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	✅	47.5s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	⏭️	1ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class	⏭️	1ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner	✅	1m 20s
tests/trainer/trainer_test.py::ColocateModeTest::test_trainer	✅	2m 5s

Github Test Reporter by CTRF 💚

trinity/common/verl_config.py

Add ClipB example

c788485

hiyuchang changed the title ~~[Example] Clip_B and Clip_V from entropy dynmics~~ [Example] Clip_B and Clip_V from entropy dynamics Feb 13, 2026

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

hiyuchang mentioned this pull request Feb 13, 2026

Release code and artifacts for Entropy Dynamics RFT paper on Hugging Face #503

Open

hiyuchang added 6 commits February 24, 2026 12:27

add clipv

b8d67e3

fix typo

8e01fab

remove some comments

1bff739

add registry for compute_log_prob

92a23e5

update dp_actor

9df9355

update to patch version

373e606

pan-x-c reviewed Feb 25, 2026

View reviewed changes

trinity/trainer/verl/fsdp_workers.py Outdated Show resolved Hide resolved

trinity/trainer/verl/fsdp_workers.py Outdated Show resolved Hide resolved

fix typo

0ac9a63

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

hiyuchang added 4 commits February 26, 2026 10:37

fix gemini review

a12de7b

fix

b885222

tiny fix

1c8d2dd

update readme

63218b8

add _forward_micro_batch

1139db5

hiyuchang commented Feb 26, 2026

View reviewed changes

trinity/common/verl_config.py Outdated Show resolved Hide resolved

Apply suggestion from @hiyuchang

e2e229f

hiyuchang commented Feb 26, 2026

View reviewed changes

trinity/common/verl_config.py Show resolved Hide resolved

Apply suggestion from @hiyuchang

8398cc9

pan-x-c approved these changes Feb 26, 2026

View reviewed changes

pan-x-c merged commit e8be774 into agentscope-ai:main Feb 26, 2026
1 check passed

Conversation

hiyuchang commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hiyuchang commented Feb 26, 2026

Uh oh!

hiyuchang commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hiyuchang commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Summary

Failed Tests

Skipped

Tests

Uh oh!

hiyuchang commented Feb 26, 2026

Uh oh!

pan-x-c commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Summary

Skipped

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hiyuchang commented Feb 13, 2026 •

edited

Loading