SSM debugging and TP prequisites #335

jlamypoirier · 2025-07-24T19:18:53Z

✨ Description

Make the transformer debugging tools available to SSMs, mainly through a base Mixer class. Bring the breaking changes for Mamba2 here to enable direct model comparison (Separate the first dt layer, fix initialization).

Also brought some extra changes from #333, it was simpler to do and will help reduce the size of that PR.

oleksost · 2025-07-28T16:11:10Z

fast_llm/models/ssm/external/apriel_15b_hybrid/modeling_ssm_hybrid_apriel15b.py

-        self.in_proj = nn.Linear(
-            self.d_model, 2 * self.d_xb + 2 * self.d_inner + self.dt_rank, bias=bias, **factory_kwargs
-        )
+        self.in_proj = nn.Linear(self.d_model, 2 * self.d_xb + 2 * self.d_inner, bias=bias, **factory_kwargs)


Seems like this separation is necessary to enable TP. Yet, it would not be backward compatible with exiting m2 checkpoints and requires manually changing the existing checkpoints' state dictionaries to work with this.

That's correct, but my understanding is that there isn't any such checkpoint yet that we want to keep?

I had some checkpoints I trained with previous code. I manually altered the state dict, so I think it should be ok.

jlamypoirier added 4 commits July 24, 2025 15:14

SSM debugging

50083ba

stuff

7b32699

misc

b49c42f

misc

31f5d41

jlamypoirier marked this pull request as ready for review July 24, 2025 22:30

jlamypoirier changed the title ~~SSM debugging~~ SSM debugging and TP prequisites Jul 25, 2025

jlamypoirier requested review from RaymondLi0 and oleksost and removed request for RaymondLi0 July 25, 2025 21:18

fix

5eea938

oleksost reviewed Jul 28, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into debug_mamba

f0c04cf

jlamypoirier requested review from oleksost, nandahkrishna, nitsanluke, tscholak and bigximik July 31, 2025 19:10

Merge remote-tracking branch 'origin/main' into debug_mamba

5a0eabc

oleksost approved these changes Aug 11, 2025

View reviewed changes

Merge branch 'main' into debug_mamba

be99372

jlamypoirier merged commit 1c5164f into main Aug 12, 2025
2 checks passed

jlamypoirier deleted the debug_mamba branch August 12, 2025 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SSM debugging and TP prequisites #335

SSM debugging and TP prequisites #335

Uh oh!

jlamypoirier commented Jul 24, 2025 •

edited

Loading

Uh oh!

oleksost Jul 28, 2025 •

edited

Loading

Uh oh!

jlamypoirier Jul 28, 2025 •

edited

Loading

Uh oh!

oleksost Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

SSM debugging and TP prequisites #335

SSM debugging and TP prequisites #335

Uh oh!

Conversation

jlamypoirier commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

Uh oh!

oleksost Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oleksost Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jlamypoirier commented Jul 24, 2025 •

edited

Loading

oleksost Jul 28, 2025 •

edited

Loading

jlamypoirier Jul 28, 2025 •

edited

Loading