Skip to content

Conversation

@grimoire
Copy link
Collaborator

backends/moe.py and nn/moe.py has been refactored.
Reuse token dispatcher in DLBlas

@lvhan028 lvhan028 requested a review from CUHKSZzxy November 24, 2025 03:49
# we don't need to read this, it would be passed to ray workers
# If Ray is launched from outside, it may fail to access the environment variables.
os.getenv('DEEPEP_MAX_BATCH_SIZE', None)
os.getenv('DEEPEP_MAX_TOKENS_PER_RANK', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set those envs manually?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grimoire
Copy link
Collaborator Author

Qwen3_30b_A3B_Thinking_2507 dp2 ep4

dataset version metric mode 4144
core_average - naive_average gen 66.25
- - - -
Instruction Following - - - -
IFEval 353ae7 Prompt-level-strict-accuracy gen 87.62
- - - -
General Reasoning - - - -
hle_llmjudge 6ff468 accuracy gen 10.80
GPQA_diamond_repeat_4 772ea0 accuracy (4 runs average) gen 70.83
- - - -
Math Calculation - - - -
aime2025_repeat_32 5e9f4f accuracy (32 runs average) gen 85.62
- - - -
Knowledge - - - -
mmlu_pro - naive_average gen 79.93
- - - -
Code - - - -
lcb_code_generation_repeat_6 b5b6c5 pass@1 (6 runs average) gen 62.67

@lvhan028 lvhan028 added the enhancement New feature or request label Nov 27, 2025
@lvhan028
Copy link
Collaborator

#4084 has been merged.
May merge the latest main and improve lmdeploy/pytorch/backends/deepep_moe_checker.py with the singleton decorator

Copy link
Collaborator

@CUHKSZzxy CUHKSZzxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/metrics/metrics_processor.py#L20

May consider use singleton to simply this as well, or leave it to me just as a reminder here.

num_warps = 8
num_experts = num_recv_tokens_per_expert.shape[0]
hidden_size = recv_x.shape[1]
# grid = (triton.cdiv(hidden_size, BLOCK_D), num_experts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nits, unused one?



@triton.jit
def _fwd_kernel_ep_scatter_1(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though from dlblas, should be their problem, _fwd_kernel_ep_scatter_1, _fwd_kernel_ep_scatter_2 seems confusing function names.

class MoeType(Enum):
"""Batch ecex type."""
Default = auto()
DSAsyncDecode = auto()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of DS?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deepseek, I guess?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants