Skip to content

model : add GroveMoE support #15510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

model : add GroveMoE support #15510

wants to merge 2 commits into from

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Aug 22, 2025

Adds support for inclusionAI/GroveMoE, a novel adjugate experts grouped with ordinary experts architecture (paper).

The PR is in a fully working state, but I submit it as draft because it requires a scalar div implementation that was quickly hacked together just to get the model running. Only div is (very crudely) implemented, and only for CPU (doesn't matter, not much computation is spent here), and I'm not satisfied that the API makes sense, in short this requires more thought!

@slaren @ggerganov Ideas/input about how best to implement scalar div, or even alternate solutions would be much appreciated!

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Aug 22, 2025
@CISC CISC removed Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Aug 22, 2025
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Aug 22, 2025
@CISC CISC removed Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Aug 22, 2025
@CISC
Copy link
Collaborator Author

CISC commented Aug 22, 2025

Looks like ccache breaks the build (using cached files newer than this branch), not important right now though...

ggml_tensor * weights = ggml_get_rows(ctx0,
ggml_reshape_3d(ctx0, probs, 1, n_expert, n_tokens), selected_experts); // [1, n_expert_used, n_tokens]
if (arch == LLM_ARCH_GROVEMOE && n_expert != hparams.n_expert) {
selected_experts = ggml_div_scalar_i32(ctx0, selected_experts, hparams.n_group_experts);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given A / n == A * (1/n), so can we do this?

Suggested change
selected_experts = ggml_div_scalar_i32(ctx0, selected_experts, hparams.n_group_experts);
selected_experts = ggml_scale(ctx0, selected_experts, 1.0f / float(hparams.n_group_experts));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, selected_experts is an I32.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, an alternative method could be to support i32 input value for ggml_scale. The i32 value can be internally converted to f32.

Although, one downside is that we should keep the output of ggml_scale to be f32 for consistency. In such case, we would also need to implement ggml_cast f32 --> i32

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already went down this rabbit-hole, it complicates a lot. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just mentioning that because I think the ability to ggml_cast between f32 and i32 could be useful in the future. and on most backends it will be quite trivial to add this conversion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants