Add customized renormalized moe routing kernel for moe cutlass backend #4955

ChristinaZ · 2025-06-05T12:28:55Z

Add customized renormalized moe routing kernel for moe cutlass backend

Description

Add pytorch op torch.ops.trtllm.renorm_moe_routing_op and use it when num_experts<=128 and num_experts_per_token <= 8.
Related files: cpp/tensorrt_llm/thop/renormMoeRoutingOp.cpp
tensorrt_llm/_torch/modules/fused_moe/routing.py
(Hi @hlu1, since this modification is related to the MOE part. Please help review it. ）
Add the customized kernel renormMoeRoutingKernel
Related files: cpp/tensorrt_llm/kernels/renormMoeRoutingKernels.cu
cpp/tensorrt_llm/kernels/renormMoeRoutingKernels.cu
Add unit test test_customized_renormalize_moe_routing()
Related file: tests/unittest/_torch/modules/test_moe_routing.py
Besides adding this feature, I also did a small modification for RoutingMethodType. By replacing the variable name RoutingMethodType.Qwen3 with RoutingMethodType.RenormalizeNaive., we hope to facilitate broader usage in the future.
(Hi @djns99 and @rosenrodt , please help review this part.)

Test Coverage

pytest -k test_customized_renormalize_moe_routing tests/unittest/_torch/modules/test_moe_routing.py

ChristinaZ · 2025-06-05T12:30:24Z

/bot run

tensorrt-cicd · 2025-06-05T12:37:05Z

PR_Github #7728 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-05T14:46:06Z

PR_Github #7728 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5601 completed with status: 'FAILURE'

hlu1

Please make sure to run the multi-gpu tests to test accuracy of qwen3

tensorrt_llm/_torch/modules/fused_moe/routing.py

djns99

LGTM, thanks @ChristinaZ

tensorrt_llm/_torch/modules/fused_moe/routing.py

cpp/tensorrt_llm/thop/renormMoeRoutingOp.cpp

ChristinaZ · 2025-06-06T06:08:59Z

/bot run

tensorrt-cicd · 2025-06-06T06:14:49Z

PR_Github #7845 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-06T06:16:06Z

PR_Github #7845 [ run ] completed with state FAILURE

ChristinaZ · 2025-06-06T06:38:45Z

/bot run

tensorrt-cicd · 2025-06-06T06:44:51Z

PR_Github #7850 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-06T08:22:15Z

PR_Github #7850 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5669 completed with status: 'FAILURE'

ChristinaZ · 2025-06-06T09:34:05Z

/bot run

tensorrt-cicd · 2025-06-06T09:41:17Z

PR_Github #7889 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-06T15:31:52Z

PR_Github #7889 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5699 completed with status: 'FAILURE'

ChristinaZ · 2025-06-08T05:49:23Z

/bot run

…d. Replace name Qwen3 with RenormalizeNaive. Signed-off-by: Christina Zhang <[email protected]>

… data type half Signed-off-by: Christina Zhang <[email protected]>

ChristinaZ · 2025-06-08T22:06:53Z

/bot run

tensorrt-cicd · 2025-06-08T22:12:43Z

PR_Github #8043 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-09T00:08:49Z

PR_Github #8043 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5829 completed with status: 'FAILURE'

Signed-off-by: Christina Zhang <[email protected]>

ChristinaZ · 2025-06-09T03:46:51Z

/bot run

tensorrt-cicd · 2025-06-09T03:52:22Z

PR_Github #8085 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-09T08:54:28Z

PR_Github #8085 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5864 completed with status: 'SUCCESS'

ChristinaZ requested review from byshiue, hlu1, nekorobov and rosenrodt June 5, 2025 12:28

ChristinaZ self-assigned this Jun 5, 2025

ChristinaZ requested a review from a team as a code owner June 5, 2025 12:28

hlu1 reviewed Jun 5, 2025

View reviewed changes

tensorrt_llm/_torch/modules/fused_moe/routing.py Outdated Show resolved Hide resolved

tensorrt_llm/_torch/modules/fused_moe/routing.py Outdated Show resolved Hide resolved

tensorrt_llm/_torch/modules/fused_moe/routing.py Outdated Show resolved Hide resolved

djns99 approved these changes Jun 5, 2025

View reviewed changes

tensorrt_llm/_torch/modules/fused_moe/routing.py Outdated Show resolved Hide resolved

cpp/tensorrt_llm/thop/renormMoeRoutingOp.cpp Outdated Show resolved Hide resolved

ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch 3 times, most recently from 78cfd51 to 375308e Compare June 6, 2025 06:08

ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch from 375308e to 95fdd89 Compare June 6, 2025 06:34

ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch from 95fdd89 to 215001f Compare June 6, 2025 09:32

ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch from 215001f to 2968d3d Compare June 8, 2025 05:48

NVIDIA deleted a comment from tensorrt-cicd Jun 8, 2025

ChristinaZ added 2 commits June 8, 2025 15:05

Add customized renormalized moe routing kernel for moe cutlass backen…

7055d93

…d. Replace name Qwen3 with RenormalizeNaive. Signed-off-by: Christina Zhang <[email protected]>

Add support for arbitrary expert number less than or equal to 128 and…

4c5e953

… data type half Signed-off-by: Christina Zhang <[email protected]>

ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch from 4a98e2d to 4c5e953 Compare June 8, 2025 22:06

Prioritize elements with smaller indices

f137fd7

Signed-off-by: Christina Zhang <[email protected]>

byshiue approved these changes Jun 9, 2025

View reviewed changes

byshiue merged commit f45aff2 into NVIDIA:main Jun 9, 2025
3 checks passed

tburt-nv mentioned this pull request Jun 9, 2025

[https://nvbugs/5332927] Waive new tests #5051

Merged

Add customized renormalized moe routing kernel for moe cutlass backend #4955

Add customized renormalized moe routing kernel for moe cutlass backend #4955

Uh oh!

Conversation

ChristinaZ commented Jun 5, 2025

Add customized renormalized moe routing kernel for moe cutlass backend

Description

Test Coverage

Uh oh!

ChristinaZ commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

hlu1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

djns99 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ChristinaZ commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

ChristinaZ commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

ChristinaZ commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

ChristinaZ commented Jun 8, 2025

Uh oh!

ChristinaZ commented Jun 8, 2025

Uh oh!

tensorrt-cicd commented Jun 8, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

ChristinaZ commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants