[DSV3] GroupedExperts weights conversion optimization #1639

wwwjn · 2025-08-25T23:52:42Z

Algorithm summary:

Numerical comparison using 16B model:

Note: the final results are not exactly same because the numerics are small

torchtitan/models/deepseek_v3/model/state_dict_adapter.py

wwwjn · 2025-08-28T23:17:36Z

~~Need to clean up the test part a little bit before ready to review :)~~

Updated: Ready for review now

torchtitan/models/deepseek_v3/model/state_dict_adapter.py

fegin · 2025-08-29T05:14:43Z

torchtitan/models/deepseek_v3/model/state_dict_adapter.py

+                self.local_experts_indices[abstract_key][1]
+                - self.local_experts_indices[abstract_key][0]
+            )
+            if len(experts) == expected_n_experts:


I want to return instead of continue, but I'm not sure if that is true. Also see the comment below. I want to understand if we can avoid looping layers.

Suggested change

if len(experts) == expected_n_experts:

if len(experts) < expected_n_experts:

continue

When a new fqn is processed, the _concatenate_local_expert_weights function is called to check if we can concatenate the individual experts weights into GroupedExpert weights. If the fqn has layer_num, the only possible layer can be merged is layer with id=layer_num. In this way, we could remove the loop of layers as you suggested.

So we could use return here instead of continue after remove the loop

torchtitan/models/deepseek_v3/model/state_dict_adapter.py

tianyu-l

Sounds good to me. Please address any remaining concerns @fegin has.

wwwjn requested review from tianyu-l, fegin and wconstab as code owners August 25, 2025 23:52

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 25, 2025

wwwjn changed the title ~~[WIP][DSV3~~ [WIP][DSV3] GroupedExperts weights conversion optimization Aug 25, 2025

fegin reviewed Aug 26, 2025

View reviewed changes

torchtitan/models/deepseek_v3/model/state_dict_adapter.py Show resolved Hide resolved

wwwjn changed the title ~~[WIP][DSV3] GroupedExperts weights conversion optimization~~ [DSV3] GroupedExperts weights conversion optimization Aug 28, 2025

wwwjn requested a review from fegin August 28, 2025 22:28

wwwjn changed the title ~~[DSV3] GroupedExperts weights conversion optimization~~ [WIP][DSV3] GroupedExperts weights conversion optimization Aug 28, 2025

wwwjn added 7 commits August 28, 2025 17:17

add to_hf

81f698a

debugging

e7c39f6

debugging

4bd34fb

fix loading error

943c0a3

fix assemble algo

fb3b3fc

test

b4d614d

clean up

7e331ab

wwwjn force-pushed the dsv3-sd-fix branch from 4cea4af to 85a7043 Compare August 29, 2025 00:18

wwwjn changed the title ~~[WIP][DSV3] GroupedExperts weights conversion optimization~~ [DSV3] GroupedExperts weights conversion optimization Aug 29, 2025

further clean up

6e6d5cd

wwwjn force-pushed the dsv3-sd-fix branch from 85a7043 to 6e6d5cd Compare August 29, 2025 00:23

fix lint

aa56404

fegin reviewed Aug 29, 2025

View reviewed changes

fix comments

78896d0

tianyu-l approved these changes Aug 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DSV3] GroupedExperts weights conversion optimization #1639

[DSV3] GroupedExperts weights conversion optimization #1639

wwwjn commented Aug 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

wwwjn commented Aug 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fegin Aug 29, 2025

Uh oh!

wwwjn Aug 29, 2025

Uh oh!

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

	if len(experts) == expected_n_experts:
	if len(experts) < expected_n_experts:
	continue

[DSV3] GroupedExperts weights conversion optimization #1639

Are you sure you want to change the base?

[DSV3] GroupedExperts weights conversion optimization #1639

Conversation

wwwjn commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Algorithm summary:

Numerical comparison using 16B model:

Uh oh!

Uh oh!

wwwjn commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fegin Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wwwjn commented Aug 25, 2025 •

edited

Loading

wwwjn commented Aug 28, 2025 •

edited

Loading