Skip to content

Conversation

@deepakn94
Copy link
Contributor

No description provided.

@deepakn94 deepakn94 requested review from a team as code owners November 19, 2025 02:16
@deepakn94 deepakn94 added this to the Core 0.16 milestone Nov 19, 2025
@deepakn94 deepakn94 self-assigned this Nov 19, 2025
@deepakn94 deepakn94 force-pushed the dnarayanan/latent_moe branch from d088236 to b0a2d8c Compare November 19, 2025 02:32
# Project the output back from latent dimension to hidden dimension after combine
# in latent dimension.
if self.config.moe_latent_size:
output, _ = self.fc2_latent_proj(output)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fc2_latent_proj might return a bias if self.config.add_bias_linear is set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. But what should be the right behavior here if this layer does have a bias?

if self.config.moe_latent_size and mlp_bias is not None:
output = output + mlp_bias
mlp_bias = None
output = self.combine(output, shared_expert_output)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output will be in latent dimension, while shared_expert_output will be in hidden dimension here. We may have to move self.fc2_latent_proj inside self.combine before the addition of output and shared_expert_output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moved.

@deepakn94 deepakn94 force-pushed the dnarayanan/latent_moe branch from b0a2d8c to 4463081 Compare November 22, 2025 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants