Skip to content

SSM debugging and TP prequisites #335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 12, 2025
Merged

SSM debugging and TP prequisites #335

merged 8 commits into from
Aug 12, 2025

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Jul 24, 2025

✨ Description

Make the transformer debugging tools available to SSMs, mainly through a base Mixer class. Bring the breaking changes for Mamba2 here to enable direct model comparison (Separate the first dt layer, fix initialization).

Also brought some extra changes from #333, it was simpler to do and will help reduce the size of that PR.

@jlamypoirier jlamypoirier marked this pull request as ready for review July 24, 2025 22:30
@jlamypoirier jlamypoirier changed the title SSM debugging SSM debugging and TP prequisites Jul 25, 2025
@jlamypoirier jlamypoirier requested review from RaymondLi0 and oleksost and removed request for RaymondLi0 July 25, 2025 21:18
self.in_proj = nn.Linear(
self.d_model, 2 * self.d_xb + 2 * self.d_inner + self.dt_rank, bias=bias, **factory_kwargs
)
self.in_proj = nn.Linear(self.d_model, 2 * self.d_xb + 2 * self.d_inner, bias=bias, **factory_kwargs)
Copy link
Contributor

@oleksost oleksost Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this separation is necessary to enable TP. Yet, it would not be backward compatible with exiting m2 checkpoints and requires manually changing the existing checkpoints' state dictionaries to work with this.

Copy link
Collaborator Author

@jlamypoirier jlamypoirier Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, but my understanding is that there isn't any such checkpoint yet that we want to keep?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some checkpoints I trained with previous code. I manually altered the state dict, so I think it should be ok.

@jlamypoirier jlamypoirier merged commit 1c5164f into main Aug 12, 2025
2 checks passed
@jlamypoirier jlamypoirier deleted the debug_mamba branch August 12, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants