Skip to content

Conversation

@zhewenl
Copy link
Collaborator

@zhewenl zhewenl commented Nov 14, 2025

Purpose

AMD CI is using mi325, but the MoE config is not added:

WARNING [fused_moe.py:886] Using default MoE config. Performance might be sub-optimal!
Config file not found at ['/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json']

This PR adds llama4 MoE config for mi325, which should be identical to mi300 config here: #16847

@mergify mergify bot added the llama Related to Llama models label Nov 14, 2025
@zhewenl zhewenl requested a review from houseroad November 14, 2025 07:05
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new configuration file for Fused Mixture-of-Experts (MoE) kernels, specifically for the AMD Instinct MI325X GPU using the fp8_w8a8 data type. The added file, E=128,N=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json, is intended to resolve a CI failure caused by its absence. The configuration is stated to be identical to that of the MI300X, which is a reasonable approach for enabling support on a similar architecture. The file format and naming convention align with the existing structure for kernel configurations. The change appears correct and should resolve the reported issue.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new configuration file for the Fused MoE kernel to support the AMD Instinct MI325X GPU with fp8 precision. The change is straightforward and aims to fix a missing configuration warning, which should provide better performance than the default settings. The new configuration file's naming and structure are consistent with the existing implementation. The kernel parameters within the file appear valid. I have not identified any high or critical severity issues in this pull request.

@yeqcharlotte
Copy link
Collaborator

how is the config tuned? is it from autotune script or copied from mi300x? cc: @mxz297 @bradleyhd

@zhewenl
Copy link
Collaborator Author

zhewenl commented Nov 14, 2025

@yeqcharlotte just copied from mi300 since they share a same architecture

@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 15, 2025
@tjtanaa tjtanaa enabled auto-merge (squash) November 15, 2025 00:20
@vllm-bot vllm-bot merged commit 1ec978c into vllm-project:main Nov 15, 2025
49 of 51 checks passed
geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants