-
Notifications
You must be signed in to change notification settings - Fork 627
Moe bf16 ep #4144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Moe bf16 ep #4144
Conversation
| # we don't need to read this, it would be passed to ray workers | ||
| # If Ray is launched from outside, it may fail to access the environment variables. | ||
| os.getenv('DEEPEP_MAX_BATCH_SIZE', None) | ||
| os.getenv('DEEPEP_MAX_TOKENS_PER_RANK', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set those envs manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DLBlas read these vars to build buffer.
https://github.com/DeepLink-org/DLBlas/blob/1710a860f654ddf50907251ec51670910368ee45/dlblas/layers/moe/token_dispatcher.py#L43
|
Qwen3_30b_A3B_Thinking_2507 dp2 ep4
|
|
#4084 has been merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/metrics/metrics_processor.py#L20
May consider use singleton to simply this as well, or leave it to me just as a reminder here.
| num_warps = 8 | ||
| num_experts = num_recv_tokens_per_expert.shape[0] | ||
| hidden_size = recv_x.shape[1] | ||
| # grid = (triton.cdiv(hidden_size, BLOCK_D), num_experts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nits, unused one?
|
|
||
|
|
||
| @triton.jit | ||
| def _fwd_kernel_ep_scatter_1( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though from dlblas, should be their problem, _fwd_kernel_ep_scatter_1, _fwd_kernel_ep_scatter_2 seems confusing function names.
| class MoeType(Enum): | ||
| """Batch ecex type.""" | ||
| Default = auto() | ||
| DSAsyncDecode = auto() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the meaning of DS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deepseek, I guess?
backends/moe.py and nn/moe.py has been refactored.
Reuse token dispatcher in DLBlas