[QUESTION] Setting num-attention-heads=0 for Mamba #1194
Unanswered
zixianwang2022
asked this question in
Q&A
Replies: 1 comment
-
|
May I ask which branch of Megatron supports Mamba2?The readme states that the main branch is no longer supported. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Your question
Hi, it seems like I have triggered many assertion errors when trying to train pure Mamba2 without any attention by setting
NUM_ATTENTION_HEADS=0.Can I just give
and give
NUM_ATTENTION_HEADSa random num to avoid triggering assertions?I don't see all the errors by doing so.
Beta Was this translation helpful? Give feedback.
All reactions