-
Notifications
You must be signed in to change notification settings - Fork 12.8k
model : add GroveMoE support #15510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
model : add GroveMoE support #15510
Conversation
Looks like ccache breaks the build (using cached files newer than this branch), not important right now though... |
ggml_tensor * weights = ggml_get_rows(ctx0, | ||
ggml_reshape_3d(ctx0, probs, 1, n_expert, n_tokens), selected_experts); // [1, n_expert_used, n_tokens] | ||
if (arch == LLM_ARCH_GROVEMOE && n_expert != hparams.n_expert) { | ||
selected_experts = ggml_div_scalar_i32(ctx0, selected_experts, hparams.n_group_experts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given A / n == A * (1/n)
, so can we do this?
selected_experts = ggml_div_scalar_i32(ctx0, selected_experts, hparams.n_group_experts); | |
selected_experts = ggml_scale(ctx0, selected_experts, 1.0f / float(hparams.n_group_experts)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, selected_experts
is an I32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, an alternative method could be to support i32 input value for ggml_scale. The i32 value can be internally converted to f32.
Although, one downside is that we should keep the output of ggml_scale to be f32 for consistency. In such case, we would also need to implement ggml_cast f32 --> i32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already went down this rabbit-hole, it complicates a lot. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just mentioning that because I think the ability to ggml_cast
between f32 and i32 could be useful in the future. and on most backends it will be quite trivial to add this conversion
Adds support for inclusionAI/GroveMoE, a novel adjugate experts grouped with ordinary experts architecture (paper).
The PR is in a fully working state, but I submit it as draft because it requires a scalar div implementation that was quickly hacked together just to get the model running. Only div is (very crudely) implemented, and only for CPU (doesn't matter, not much computation is spent here), and I'm not satisfied that the API makes sense, in short this requires more thought!
@slaren @ggerganov Ideas/input about how best to implement scalar div, or even alternate solutions would be much appreciated!