-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Changes to support latent MoEs #2296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d088236 to
b0a2d8c
Compare
| # Project the output back from latent dimension to hidden dimension after combine | ||
| # in latent dimension. | ||
| if self.config.moe_latent_size: | ||
| output, _ = self.fc2_latent_proj(output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fc2_latent_proj might return a bias if self.config.add_bias_linear is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. But what should be the right behavior here if this layer does have a bias?
| if self.config.moe_latent_size and mlp_bias is not None: | ||
| output = output + mlp_bias | ||
| mlp_bias = None | ||
| output = self.combine(output, shared_expert_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output will be in latent dimension, while shared_expert_output will be in hidden dimension here. We may have to move self.fc2_latent_proj inside self.combine before the addition of output and shared_expert_output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, moved.
Signed-off-by: Deepak Narayanan <[email protected]>
…tent-size Signed-off-by: Deepak Narayanan <[email protected]>
… right shape Signed-off-by: Deepak Narayanan <[email protected]>
b0a2d8c to
4463081
Compare
No description provided.