-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add customized renormalized moe routing kernel for moe cutlass backend #4955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/bot run |
|
PR_Github #7728 [ run ] triggered by Bot |
|
PR_Github #7728 [ run ] completed with state |
hlu1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure to run the multi-gpu tests to test accuracy of qwen3
djns99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @ChristinaZ
78cfd51 to
375308e
Compare
|
/bot run |
|
PR_Github #7845 [ run ] triggered by Bot |
|
PR_Github #7845 [ run ] completed with state |
375308e to
95fdd89
Compare
|
/bot run |
|
PR_Github #7850 [ run ] triggered by Bot |
|
PR_Github #7850 [ run ] completed with state |
95fdd89 to
215001f
Compare
|
/bot run |
|
PR_Github #7889 [ run ] triggered by Bot |
|
PR_Github #7889 [ run ] completed with state |
215001f to
2968d3d
Compare
|
/bot run |
…d. Replace name Qwen3 with RenormalizeNaive. Signed-off-by: Christina Zhang <[email protected]>
… data type half Signed-off-by: Christina Zhang <[email protected]>
4a98e2d to
4c5e953
Compare
|
/bot run |
|
PR_Github #8043 [ run ] triggered by Bot |
|
PR_Github #8043 [ run ] completed with state |
Signed-off-by: Christina Zhang <[email protected]>
|
/bot run |
|
PR_Github #8085 [ run ] triggered by Bot |
|
PR_Github #8085 [ run ] completed with state |
Add customized renormalized moe routing kernel for moe cutlass backend
Description
Add pytorch op
torch.ops.trtllm.renorm_moe_routing_opand use it whennum_experts<=128 and num_experts_per_token <= 8.Related files: cpp/tensorrt_llm/thop/renormMoeRoutingOp.cpp
tensorrt_llm/_torch/modules/fused_moe/routing.py
(Hi @hlu1, since this modification is related to the MOE part. Please help review it. )
Add the customized kernel
renormMoeRoutingKernelRelated files: cpp/tensorrt_llm/kernels/renormMoeRoutingKernels.cu
cpp/tensorrt_llm/kernels/renormMoeRoutingKernels.cu
Add unit test test_customized_renormalize_moe_routing()
Related file: tests/unittest/_torch/modules/test_moe_routing.py
Besides adding this feature, I also did a small modification for
RoutingMethodType. By replacing the variable nameRoutingMethodType.Qwen3withRoutingMethodType.RenormalizeNaive., we hope to facilitate broader usage in the future.(Hi @djns99 and @rosenrodt , please help review this part.)
Test Coverage