Skip to content

[Cherry-Pick][Optimization] merge_allreduce (#7039)#7475

Draft
Copilot wants to merge 1 commit intorelease/2.5from
copilot/release-2-5-backport-pr-7039
Draft

[Cherry-Pick][Optimization] merge_allreduce (#7039)#7475
Copilot wants to merge 1 commit intorelease/2.5from
copilot/release-2-5-backport-pr-7039

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 17, 2026

Motivation

Cherry-pick of #7039 to release/2.5 branch.

Modifications

将普通专家和共享专家在计算ffn后各自的allreduce合并为一个allreduce。

具体变更:

  • glm4_moe.py 中,新增 merge_ffn_tp 标志判断是否为纯TP模式(tp>1, ep=1)
  • 在纯TP模式下,将 FusedMoEshared_expertsreduce_results 设为 False,延迟 allreduce
  • 在 forward 方法中,合并两个分支的部分和后执行一次统一的 tensor_model_parallel_all_reduce,减少一次集合通信

Conflict Resolution

tests/e2e/utils/rollout_routing_replay_test_utils.py 的 baseline 路径冲突已解决,保留了 release/2.5 分支已有的 R3_BaseLine_25_uint8_0403 路径。

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests. This is a cherry-pick of an already merged PR, no new tests needed.
  • Provide accuracy results. Same as original PR [Optimization] merge_allreduce #7039.
  • The current PR is submitting to the release branch, and the PR has been submitted to the develop branch first ([Optimization] merge_allreduce #7039).

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 17, 2026 15:49
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 17, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-17 23:56 CST

📋 Review 摘要

PR 概述:Cherry-pick #7039,将 GLM4-MoE 模型中普通专家和共享专家各自的 allreduce 合并为一次,减少集合通信开销。
变更范围model_executor/models/glm4_moe.py
影响面 TagModels Optimization

问题

未发现阻塞性问题。

总体评价

变更逻辑清晰正确,具体验证点如下:

  1. merge_ffn_tp 条件判断正确self.use_tp and not self.use_ep 精确识别纯 TP 模式(tp>1, ep=1),仅在该模式下延迟 allreduce。在 EP 模式或无并行模式下,各分支仍保持原有的内部 reduce 行为(reduce_results=True)。

  2. reduce_results 参数传递正确

    • FusedMoEreduce_results=False 时跳过内部的 tensor_model_parallel_all_reduce(已确认 moe.py 第 760 行的条件逻辑)。
    • Glm4MoeMLPreduce_results 参数仅影响 RowParallelLinear(非 use_sequence_parallel_moe 分支),use_sequence_parallel_moe 分支使用 ReplicatedLinear 不涉及 TP reduce,不受影响。
  3. forward 方法正确:先将两个分支的部分和相加(out + self.shared_experts(x)),再执行一次统一的 tensor_model_parallel_all_reduce。当 n_shared_experts == 0 时,仍会正确执行 allreduce(因为 FusedMoE 输出仍是部分和)。

  4. 函数签名匹配tensor_model_parallel_all_reduce(out, self.tp_group)communication.py 中定义的 (input_, group_) 签名一致。

这是一个有效的通信优化,在纯 TP 模式下将两次 allreduce 合并为一次,减少约 50% 的集合通信次数,对大规模推理场景有明显收益。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants