[feat] Add distributed RLFT training framework #6376

YeAnbang · 2025-08-05T06:11:05Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

* [feature] fit rl style generation * [doc] add docstr * [doc] add docstr

for more information, see https://pre-commit.ci

* [fix] fix qwen VocabParallelLMHead1D and gather output * fix tp bug * fix consumer * [feat] Support Distributed LogProb for GRPO Training * [fix] fix loss func * [fix] fix log prob plugin * [fix] fix qwen modeling param * [fix] rm comments * [fix] rm hard-code;fix non-dist version * [fix] fix test file param name and benchmark tp gather output=True/False * [fix] rm non-dist version in dist log prob * [fix] fix comments * [fix] fix dis log prob plugin * [fix] fix test case * [fix] fix qwen VocabParallelLMHead1D and gather output * [fix] fix DistLogProb comments * [fix] restore tp size * [fix] fix comments * [fix] fix comment; fix LogSoftmax usage --------- Co-authored-by: Tong Li <[email protected]>

for more information, see https://pre-commit.ci

Co-authored-by: Tong Li <[email protected]>

…6353) * support n_behind, add profiling * fix bugs * fix visualization * fix behind * fix loop issue * add profiling * fix update * update assert * remove assert --------- Co-authored-by: Tong Li <[email protected]>

* fix no L2 regularization error * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

for more information, see https://pre-commit.ci

ver217 and others added 30 commits August 5, 2025 13:59

[chat] add distributed impl (#6210)

162bb42

[feature] fit RL style generation (#6213)

7a2d455

* [feature] fit rl style generation * [doc] add docstr * [doc] add docstr

add reward related function

fa1272f

add simple grpo

40d6018

update grpo

1f07b71

polish

718c4b7

modify data loader

b7842f8

grpo consumer

5f178a7

update loss

9754a11

update reward fn

f8899dd

update example

5c75d5b

update loader

cc4cc78

add algo selection

1f15dc7

add save

88eb6e5

update select algo

246f16d

[pre-commit.ci] auto fixes from pre-commit.com hooks

f71d422

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

bc538ba

for more information, see https://pre-commit.ci

update grpo

fe017d3

update reward fn

c8db826

update reward

a537aa1

fix reward score

a4862a2

add response length

b951d0b

detach

69a1a32

fix tp bug

b19355f

fix consumer

a2ae82a

convert to 8 generation

30c7ddd

print results

bfc4582

setup update

e224673

fix transformers backend

35dabd7

YeAnbang and others added 21 commits August 5, 2025 14:04

fix bug, tested

de40c73

remove debug code

9dbb0ff

[pre-commit.ci] auto fixes from pre-commit.com hooks

72b2d98

for more information, see https://pre-commit.ci

move out evaluation func (#6343)

6ae54a6

Co-authored-by: Tong Li <[email protected]>

fix pp memory issue (#6344)

3a4681f

Co-authored-by: Tong Li <[email protected]>

Manually schedule resources and support auto master address assigning

3b3c48d

modify readme

6a0b809

update readme

79a7b99

add ray timeout handling instruction

80c576f

Update README.md

73384be

fix num_update_per_episode

0f71c79

optimize pp log_softmax OOM

a960990

implement memory efficient logprob

245c8c2

fix small bug

b314da1

add dp rank for multi-dp (#6351)

685e0bd

Co-authored-by: Tong Li <[email protected]>

fix code evaluation

352a8e0

fix style

eafbc89

add entropy (#6363)

3d9dd34

hotfix entropy calculation (#6364)

c782976

[Fix] Add L2 Regularization (#6372)

118a66f

* fix no L2 regularization error * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

YeAnbang changed the base branch from main to grpo-latest August 5, 2025 06:11

fix missing or wrong file during rebase

3746f73

YeAnbang force-pushed the grpo-latest-rebase-main branch from fb8202b to 3746f73 Compare August 5, 2025 06:42

YeAnbang changed the base branch from grpo-latest to main August 5, 2025 06:48

YeAnbang marked this pull request as ready for review August 5, 2025 09:40

YeAnbang requested a review from a team as a code owner August 5, 2025 09:40

YeAnbang and others added 2 commits August 6, 2025 06:15

tested after rebasing, fix importance sampling bug

32b2148

[pre-commit.ci] auto fixes from pre-commit.com hooks

08a1244

for more information, see https://pre-commit.ci

YeAnbang requested a review from TongLi3701 August 11, 2025 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] Add distributed RLFT training framework #6376

[feat] Add distributed RLFT training framework #6376

Uh oh!

YeAnbang commented Aug 5, 2025

Uh oh!

Uh oh!

[feat] Add distributed RLFT training framework #6376

Are you sure you want to change the base?

[feat] Add distributed RLFT training framework #6376

Uh oh!

Conversation

YeAnbang commented Aug 5, 2025

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

Uh oh!