Skip to content

Tensor-parallel SSM #333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: concatenated_dim
Choose a base branch
from
Open

Tensor-parallel SSM #333

wants to merge 33 commits into from

Conversation

jlamypoirier
Copy link
Collaborator

✨ Description

Please provide a brief summary of the changes, relevant motivation, and context.
Include any related issue numbers or links to discussions, and explain why this change is necessary.

Closes #

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@jlamypoirier jlamypoirier changed the base branch from main to debug_mamba July 24, 2025 19:21
@jlamypoirier jlamypoirier changed the base branch from debug_mamba to concatenated_dim July 28, 2025 22:11
@jlamypoirier jlamypoirier marked this pull request as ready for review July 29, 2025 22:04
@@ -284,10 +284,15 @@ def test_load_pretrained(
@pytest.mark.model_testing_group(ModelTestingGroup.convert)
def test_huggingface_model(model_testing_config, get_convert_path):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test isn't working for SSM because the HF wrapper can't find the external model. I would like to make it work so we have at least one correctness test. Any idea how to make it work? (@bigximik @oleksost @tscholak ?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my Vision+Hybrid-ssm PR, I updated the SSM conversion to copy the modeling files to the export directory https://github.com/ServiceNow/Fast-LLM/pull/332/files#diff-58be369d99e6722a68e734002686ae4afcfd423261e4d3d3b9d6aa552a6f2a14R729-R784
But this PR is far from being merged ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to add them here too, but the external models don't seem to be working

FAILED tests/models/test_checkpoint.py::test_huggingface_model[hybrid_mamba2]@dependency_group_2 - AttributeError: 'DynamicCache' object has no attribute 'has_previous_state'
FAILED tests/models/test_checkpoint.py::test_huggingface_model[hybrid_discrete_mamba2]@dependency_group_3 - AttributeError: 'NoneType' object has no attribute 'ssm_states'

@jlamypoirier jlamypoirier requested review from RaymondLi0, oleksost, nitsanluke, tscholak and bigximik and removed request for RaymondLi0 July 31, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants