model : add GroveMoE support #15510

CISC · 2025-08-22T17:58:47Z

Adds support for inclusionAI/GroveMoE, a novel adjugate experts grouped with ordinary experts architecture (paper).

The PR is in a fully working state, but I submit it as draft because it requires a scalar div implementation that was quickly hacked together just to get the model running. Only div is (very crudely) implemented, and only for CPU (doesn't matter, not much computation is spent here), and I'm not satisfied that the API makes sense, in short this requires more thought!

@slaren @ggerganov Ideas/input about how best to implement scalar div, or even alternate solutions would be much appreciated!

CISC · 2025-08-22T18:23:41Z

Looks like ccache breaks the build (using cached files newer than this branch), not important right now though...

ngxson · 2025-08-22T21:50:09Z

src/llama-graph.cpp

-    ggml_tensor * weights = ggml_get_rows(ctx0,
-            ggml_reshape_3d(ctx0, probs, 1, n_expert, n_tokens), selected_experts); // [1, n_expert_used, n_tokens]
+    if (arch == LLM_ARCH_GROVEMOE && n_expert != hparams.n_expert) {
+        selected_experts = ggml_div_scalar_i32(ctx0, selected_experts, hparams.n_group_experts);


Given A / n == A * (1/n), so can we do this?

Suggested change

selected_experts = ggml_div_scalar_i32(ctx0, selected_experts, hparams.n_group_experts);

selected_experts = ggml_scale(ctx0, selected_experts, 1.0f / float(hparams.n_group_experts));

No, selected_experts is an I32.

In this case, an alternative method could be to support i32 input value for ggml_scale. The i32 value can be internally converted to f32.

Although, one downside is that we should keep the output of ggml_scale to be f32 for consistency. In such case, we would also need to implement ggml_cast f32 --> i32

I already went down this rabbit-hole, it complicates a lot. :)

just mentioning that because I think the ability to ggml_cast between f32 and i32 could be useful in the future. and on most backends it will be quite trivial to add this conversion

add GroveMoE support

25963a8

remove constexpr that fails on certain compilers

9b8a31a

ngxson reviewed Aug 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model : add GroveMoE support #15510

model : add GroveMoE support #15510

CISC commented Aug 22, 2025

Uh oh!

CISC commented Aug 22, 2025

Uh oh!

ngxson Aug 22, 2025

Uh oh!

CISC Aug 22, 2025

Uh oh!

ngxson Aug 22, 2025

Uh oh!

CISC Aug 23, 2025

Uh oh!

ngxson Aug 23, 2025

Uh oh!

Uh oh!

	selected_experts = ggml_div_scalar_i32(ctx0, selected_experts, hparams.n_group_experts);
	selected_experts = ggml_scale(ctx0, selected_experts, 1.0f / float(hparams.n_group_experts));

model : add GroveMoE support #15510

Are you sure you want to change the base?

model : add GroveMoE support #15510

Conversation

CISC commented Aug 22, 2025

Uh oh!

CISC commented Aug 22, 2025

Uh oh!

ngxson Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!