Skip to content

Conversation

wwwjn
Copy link
Contributor

@wwwjn wwwjn commented Aug 25, 2025

Algorithm summary:

Screenshot 2025-08-28 at 3 27 36 PM

Numerical comparison using 16B model:

  • Note: the final results are not exactly same because the numerics are small
Screenshot 2025-08-28 at 3 25 11 PM Screenshot 2025-08-28 at 3 26 41 PM

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 25, 2025
@wwwjn wwwjn changed the title [WIP][DSV3 [WIP][DSV3] GroupedExperts weights conversion optimization Aug 25, 2025
@wwwjn wwwjn changed the title [WIP][DSV3] GroupedExperts weights conversion optimization [DSV3] GroupedExperts weights conversion optimization Aug 28, 2025
@wwwjn wwwjn requested a review from fegin August 28, 2025 22:28
@wwwjn wwwjn changed the title [DSV3] GroupedExperts weights conversion optimization [WIP][DSV3] GroupedExperts weights conversion optimization Aug 28, 2025
@wwwjn
Copy link
Contributor Author

wwwjn commented Aug 28, 2025

Need to clean up the test part a little bit before ready to review :)

Updated: Ready for review now

@wwwjn wwwjn changed the title [WIP][DSV3] GroupedExperts weights conversion optimization [DSV3] GroupedExperts weights conversion optimization Aug 29, 2025
self.local_experts_indices[abstract_key][1]
- self.local_experts_indices[abstract_key][0]
)
if len(experts) == expected_n_experts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to return instead of continue, but I'm not sure if that is true. Also see the comment below. I want to understand if we can avoid looping layers.

Suggested change
if len(experts) == expected_n_experts:
if len(experts) < expected_n_experts:
continue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a new fqn is processed, the _concatenate_local_expert_weights function is called to check if we can concatenate the individual experts weights into GroupedExpert weights. If the fqn has layer_num, the only possible layer can be merged is layer with id=layer_num. In this way, we could remove the loop of layers as you suggested.

So we could use return here instead of continue after remove the loop

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Please address any remaining concerns @fegin has.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants