Moe bf16 ep #4144

grimoire · 2025-11-21T12:05:02Z

backends/moe.py and nn/moe.py has been refactored.
Reuse token dispatcher in DLBlas

lvhan028 · 2025-11-24T10:37:33Z

lmdeploy/pytorch/envs.py

    # we don't need to read this, it would be passed to ray workers
    # If Ray is launched from outside, it may fail to access the environment variables.
    os.getenv('DEEPEP_MAX_BATCH_SIZE', None)
+    os.getenv('DEEPEP_MAX_TOKENS_PER_RANK', None)


Do we need to set those envs manually?

DLBlas read these vars to build buffer.
https://github.com/DeepLink-org/DLBlas/blob/1710a860f654ddf50907251ec51670910368ee45/dlblas/layers/moe/token_dispatcher.py#L43

lmdeploy/pytorch/backends/cuda/moe/default.py

lmdeploy/pytorch/nn/moe/base.py

lmdeploy/pytorch/backends/cuda/moe/blocked_fp8.py

lmdeploy/pytorch/backends/cuda/moe/default.py

grimoire · 2025-11-27T04:37:34Z

Qwen3_30b_A3B_Thinking_2507 dp2 ep4

dataset	version	metric	mode	4144
core_average	-	naive_average	gen	66.25
	-	-	-	-
Instruction Following	-	-	-	-
IFEval	353ae7	Prompt-level-strict-accuracy	gen	87.62
	-	-	-	-
General Reasoning	-	-	-	-
hle_llmjudge	6ff468	accuracy	gen	10.80
GPQA_diamond_repeat_4	772ea0	accuracy (4 runs average)	gen	70.83
	-	-	-	-
Math Calculation	-	-	-	-
aime2025_repeat_32	5e9f4f	accuracy (32 runs average)	gen	85.62
	-	-	-	-
Knowledge	-	-	-	-
mmlu_pro	-	naive_average	gen	79.93
	-	-	-	-
Code	-	-	-	-
lcb_code_generation_repeat_6	b5b6c5	pass@1 (6 runs average)	gen	62.67

lvhan028 · 2025-11-27T11:45:27Z

#4084 has been merged.
May merge the latest main and improve lmdeploy/pytorch/backends/deepep_moe_checker.py with the singleton decorator

CUHKSZzxy

https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/metrics/metrics_processor.py#L20

May consider use singleton to simply this as well, or leave it to me just as a reminder here.

CUHKSZzxy · 2025-11-27T12:53:23Z

lmdeploy/pytorch/kernels/cuda/fused_moe_ep.py

+    num_warps = 8
+    num_experts = num_recv_tokens_per_expert.shape[0]
+    hidden_size = recv_x.shape[1]
+    # grid = (triton.cdiv(hidden_size, BLOCK_D), num_experts)


Tiny nits, unused one?

CUHKSZzxy · 2025-11-27T12:59:31Z

lmdeploy/pytorch/kernels/cuda/fused_moe_ep.py

+
+
+@triton.jit
+def _fwd_kernel_ep_scatter_1(


Though from dlblas, should be their problem, _fwd_kernel_ep_scatter_1, _fwd_kernel_ep_scatter_2 seems confusing function names.

CUHKSZzxy · 2025-11-27T13:01:21Z

lmdeploy/pytorch/nn/moe/base.py

+class MoeType(Enum):
+    """Batch ecex type."""
+    Default = auto()
+    DSAsyncDecode = auto()


What's the meaning of DS?

Deepseek, I guess?

grimoire added 3 commits November 20, 2025 20:56

refactor pytorch.nn.moe

d767bca

add ep support

d0865d4

fix tp

f663b34

lvhan028 requested a review from CUHKSZzxy November 24, 2025 03:49

grimoire added 3 commits November 24, 2025 14:53

support blocked fp8 moe with split_size<world_size

30fbac1

unit test allow both fa3 and fa

2e64bb5

add singleton

1107fe2

lvhan028 reviewed Nov 24, 2025

View reviewed changes

lmdeploy/pytorch/backends/cuda/moe/default.py Outdated Show resolved Hide resolved

grimoire added 6 commits November 24, 2025 19:07

singleton and ctxmgrbase

89e7050

comment

bd53fb5

Merge branch 'main' into moe-bf16-ep

6a09849

add static

6d581eb

remove chunk

32e0699

merge main

eeaafef

lvhan028 reviewed Nov 26, 2025

View reviewed changes

lmdeploy/pytorch/nn/moe/base.py Outdated Show resolved Hide resolved

lvhan028 reviewed Nov 26, 2025

View reviewed changes

lmdeploy/pytorch/backends/cuda/moe/blocked_fp8.py Outdated Show resolved Hide resolved

grimoire added 2 commits November 26, 2025 17:59

remove forward dptp

640bc25

bound check

260fba7

lvhan028 reviewed Nov 26, 2025

View reviewed changes

lmdeploy/pytorch/backends/cuda/moe/default.py Outdated Show resolved Hide resolved

remove monkey patch

755a987

lvhan028 added the enhancement New feature or request label Nov 27, 2025

lvhan028 approved these changes Nov 27, 2025

View reviewed changes

merge main

1e1d669

CUHKSZzxy reviewed Nov 27, 2025

View reviewed changes

rename kernel

00a096c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Moe bf16 ep #4144

Moe bf16 ep #4144

Uh oh!

grimoire commented Nov 21, 2025

Uh oh!

lvhan028 Nov 24, 2025

Uh oh!

grimoire Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grimoire commented Nov 27, 2025

Uh oh!

lvhan028 commented Nov 27, 2025

Uh oh!

CUHKSZzxy left a comment •

edited

Loading

Uh oh!

CUHKSZzxy Nov 27, 2025

Uh oh!

CUHKSZzxy Nov 27, 2025

Uh oh!

CUHKSZzxy Nov 27, 2025

Uh oh!

grimoire Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@triton.jit
		def _fwd_kernel_ep_scatter_1(

Moe bf16 ep #4144

Are you sure you want to change the base?

Moe bf16 ep #4144

Uh oh!

Conversation

grimoire commented Nov 21, 2025

Uh oh!

lvhan028 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

grimoire Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grimoire commented Nov 27, 2025

Uh oh!

lvhan028 commented Nov 27, 2025

Uh oh!

CUHKSZzxy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CUHKSZzxy Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

CUHKSZzxy Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

CUHKSZzxy Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

grimoire Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CUHKSZzxy left a comment •

edited

Loading