-
Notifications
You must be signed in to change notification settings - Fork 494
[DSV3] GroupedExperts weights conversion optimization #1639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Updated: Ready for review now |
self.local_experts_indices[abstract_key][1] | ||
- self.local_experts_indices[abstract_key][0] | ||
) | ||
if len(experts) == expected_n_experts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to return instead of continue, but I'm not sure if that is true. Also see the comment below. I want to understand if we can avoid looping layers.
if len(experts) == expected_n_experts: | |
if len(experts) < expected_n_experts: | |
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a new fqn is processed, the _concatenate_local_expert_weights
function is called to check if we can concatenate the individual experts weights into GroupedExpert weights. If the fqn has layer_num
, the only possible layer can be merged is layer with id=layer_num
. In this way, we could remove the loop of layers as you suggested.
So we could use return
here instead of continue
after remove the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Please address any remaining concerns @fegin has.
Algorithm summary:
Numerical comparison using 16B model: