-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 #28709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Zhewen Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new configuration file for Fused Mixture-of-Experts (MoE) kernels, specifically for the AMD Instinct MI325X GPU using the fp8_w8a8 data type. The added file, E=128,N=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json, is intended to resolve a CI failure caused by its absence. The configuration is stated to be identical to that of the MI300X, which is a reasonable approach for enabling support on a similar architecture. The file format and naming convention align with the existing structure for kernel configurations. The change appears correct and should resolve the reported issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new configuration file for the Fused MoE kernel to support the AMD Instinct MI325X GPU with fp8 precision. The change is straightforward and aims to fix a missing configuration warning, which should provide better performance than the default settings. The new configuration file's naming and structure are consistent with the existing implementation. The kernel parameters within the file appear valid. I have not identified any high or critical severity issues in this pull request.
|
how is the config tuned? is it from autotune script or copied from mi300x? cc: @mxz297 @bradleyhd |
|
@yeqcharlotte just copied from mi300 since they share a same architecture |
…lm-project#28709) Signed-off-by: Zhewen Li <[email protected]> Signed-off-by: George D. Torres <[email protected]>
Purpose
AMD CI is using mi325, but the MoE config is not added:
This PR adds llama4 MoE config for mi325, which should be identical to mi300 config here: #16847