Skip to content

Conversation

@ChristinaZ
Copy link
Collaborator

Add customized renormalized moe routing kernel for moe cutlass backend

Description

  • Add pytorch op torch.ops.trtllm.renorm_moe_routing_op and use it when num_experts<=128 and num_experts_per_token <= 8.
    Related files: cpp/tensorrt_llm/thop/renormMoeRoutingOp.cpp
    tensorrt_llm/_torch/modules/fused_moe/routing.py
    (Hi @hlu1, since this modification is related to the MOE part. Please help review it. )

  • Add the customized kernel renormMoeRoutingKernel
    Related files: cpp/tensorrt_llm/kernels/renormMoeRoutingKernels.cu
    cpp/tensorrt_llm/kernels/renormMoeRoutingKernels.cu

  • Add unit test test_customized_renormalize_moe_routing()
    Related file: tests/unittest/_torch/modules/test_moe_routing.py

  • Besides adding this feature, I also did a small modification for RoutingMethodType. By replacing the variable name RoutingMethodType.Qwen3 with RoutingMethodType.RenormalizeNaive., we hope to facilitate broader usage in the future.
    (Hi @djns99 and @rosenrodt , please help review this part.)

Test Coverage

pytest -k test_customized_renormalize_moe_routing tests/unittest/_torch/modules/test_moe_routing.py

@ChristinaZ ChristinaZ self-assigned this Jun 5, 2025
@ChristinaZ ChristinaZ requested a review from a team as a code owner June 5, 2025 12:28
@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7728 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7728 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5601 completed with status: 'FAILURE'

Copy link
Collaborator

@hlu1 hlu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to run the multi-gpu tests to test accuracy of qwen3

Copy link
Collaborator

@djns99 djns99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ChristinaZ

@ChristinaZ ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch 3 times, most recently from 78cfd51 to 375308e Compare June 6, 2025 06:08
@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7845 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7845 [ run ] completed with state FAILURE

@ChristinaZ ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch from 375308e to 95fdd89 Compare June 6, 2025 06:34
@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7850 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7850 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5669 completed with status: 'FAILURE'

@ChristinaZ ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch from 95fdd89 to 215001f Compare June 6, 2025 09:32
@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7889 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7889 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5699 completed with status: 'FAILURE'

@ChristinaZ ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch from 215001f to 2968d3d Compare June 8, 2025 05:48
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jun 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jun 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jun 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jun 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jun 8, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jun 8, 2025
@ChristinaZ
Copy link
Collaborator Author

/bot run

…d. Replace name Qwen3 with RenormalizeNaive.

Signed-off-by: Christina Zhang <[email protected]>
@ChristinaZ ChristinaZ force-pushed the opt_Topk_Qwen3_onHopper branch from 4a98e2d to 4c5e953 Compare June 8, 2025 22:06
@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8043 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8043 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5829 completed with status: 'FAILURE'

@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8085 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8085 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5864 completed with status: 'SUCCESS'

@byshiue byshiue merged commit f45aff2 into NVIDIA:main Jun 9, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants