add moe_align_fused #1054

shihaobai · 2025-09-17T06:33:42Z

No description provided.

gemini-code-assist

Summary of Changes

Hello @shihaobai, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new fused Triton kernel designed to optimize the Mixture of Experts (MoE) alignment process. By consolidating previous separate alignment steps into a single, autotuned operation, this change aims to significantly enhance the performance and efficiency of MoE computations within the system.

Highlights

New Fused MoE Alignment Kernel: A new Triton kernel, moe_align_fused_kernel, has been introduced to efficiently handle the Mixture of Experts (MoE) token-to-expert assignment process in a fused operation.
Performance Optimization with Autotuning: The new moe_align_fused function is wrapped with an autotune decorator, enabling dynamic optimization of kernel launch parameters such as BLOCK_TOK, num_warps, and num_stages for improved performance.
Integration into MoE Pipeline: The existing moe_align and moe_align1 calls within the fused_experts_impl function have been replaced by the new moe_align_fused function, streamlining the MoE processing pipeline.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a fused Triton kernel, moe_align_fused, to optimize the MoE expert alignment process by combining the logic of moe_align and moe_align1. This is a valuable performance enhancement. However, I've identified a critical issue in the new kernel's signature that will prevent it from executing, along with a few medium-severity issues related to code clarity and maintainability. My feedback includes suggestions to fix the critical bug, refactor a redundant function parameter, and improve code consistency.

gemini-code-assist · 2025-09-17T06:35:44Z

lightllm/common/fused_moe/grouped_fused_moe.py

+    expert_to_weight_ptr,  # [expert_num, token_num * topk]
+    expert_token_num_ptr,  # [expert_num]
+    token_num,
+    topk: tl.constexpr,


The topk parameter is declared as tl.constexpr, but it is passed as a positional argument from moe_align_fused, which makes it a runtime value. According to the Triton documentation, constexpr arguments must be compile-time constants and passed as keyword arguments. This will cause a TypeError at runtime. Please remove the tl.constexpr annotation to treat it as a regular runtime argument. The autotuner is already creating specializations for different topk values via _get_moe_align_fused_static_key, so the performance impact of this change should be minimal.

Suggested change

topk: tl.constexpr,

topk,

gemini-code-assist · 2025-09-17T06:35:44Z

lightllm/common/fused_moe/grouped_fused_moe.py

+    # 遍历 topk
+    # for k in range(topk):
+    expert_ids = tl.load(topk_ids_ptr + offs, mask=mask, other=0)
+    weights = tl.load(topk_weights_ptr + offs, mask=mask, other=0.0)
+
+    # 用 atomic_add 给 expert 分配写位置
+    write_pos = tl.atomic_add(expert_token_num_ptr + expert_ids, 1, mask=mask)
+
+    # 按 token 顺序写 index 和 weight


The comments in this kernel are in Chinese, while the rest of the codebase uses English. For consistency and better maintainability for a wider audience, please translate these comments to English.

For example:

Line 237: // 遍历 topk -> // Iterate over top-k experts

Line 242: // 用 atomic_add 给 expert 分配写位置 -> // Use atomic_add to allocate write positions for experts

Line 245: // 按 token 顺序写 index 和 weight -> // Write index and weight in token order

gemini-code-assist · 2025-09-17T06:35:44Z

lightllm/common/fused_moe/grouped_fused_moe.py

+    mutates_args=["expert_to_index", "expert_to_weight", "expert_token_num"],
+)
+def moe_align_fused(
+    expert_to_index, expert_to_weight, expert_token_num, topk_ids, topk_weights, topk, run_config: Optional[dict] = None


The topk parameter in the function signature is redundant because its value is derived from topk_ids.shape within the function. The argument is also shadowed by a local variable, which is confusing and error-prone. Please remove this parameter from the function signature for clarity and simplicity. Note that this will require updating the call site in fused_experts_impl.

Suggested change

expert_to_index, expert_to_weight, expert_token_num, topk_ids, topk_weights, topk, run_config: Optional[dict] = None

expert_to_index, expert_to_weight, expert_token_num, topk_ids, topk_weights, run_config: Optional[dict] = None

gemini-code-assist · 2025-09-17T06:35:45Z

lightllm/common/fused_moe/grouped_fused_moe.py

+        moe_align_fused(
+            expert_to_tokens, expert_to_weights, expert_to_token_num, curr_topk_ids, curr_topk_weights, topk=topk_num
+        )


To accompany the removal of the redundant topk parameter from the moe_align_fused function signature, this call should be updated to no longer pass the topk argument.

Suggested change

moe_align_fused(

expert_to_tokens, expert_to_weights, expert_to_token_num, curr_topk_ids, curr_topk_weights, topk=topk_num

)

moe_align_fused(

expert_to_tokens, expert_to_weights, expert_to_token_num, curr_topk_ids, curr_topk_weights

)

add moe_align_fused

3873a63

gemini-code-assist bot reviewed Sep 17, 2025

View reviewed changes

remove unused code

0c9a293

gemini-code-assist bot reviewed Sep 17, 2025

View reviewed changes

shihaobai added 3 commits September 17, 2025 14:54

tuning config on h200

d9efc04

update

48a37f1

rename

a1c4846

hiworldwzj merged commit 3956f92 into main Sep 18, 2025
1 check passed

hiworldwzj deleted the moe_align_fused branch September 18, 2025 09:36

sufubao pushed a commit that referenced this pull request Sep 18, 2025

add moe_align_fused (#1054)

2c4be50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add moe_align_fused #1054

add moe_align_fused #1054

Uh oh!

shihaobai commented Sep 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

	expert_to_index, expert_to_weight, expert_token_num, topk_ids, topk_weights, topk, run_config: Optional[dict] = None
	expert_to_index, expert_to_weight, expert_token_num, topk_ids, topk_weights, run_config: Optional[dict] = None

add moe_align_fused #1054

add moe_align_fused #1054

Uh oh!

Conversation

shihaobai commented Sep 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!