[Cute] Block sparse support Sm100 #1985

drisspg · 2025-11-05T00:23:13Z

Summary

Implement block-sparse attention in flash_fwd_sm100.py
Update interface.py to handle SM100 block size calculations
(2x multiplier for m_block_size since 1 CTA handles 2*tile_m rows)
Add mask_mod parameter support in mask.py for block-sparse masking
Add SM100 test fixtures and tile size handling in test_mask_mod.py

Fast follow

Do the aux_tensor fastdivmod wrapping to avoid OOB reads

Also we should land:
#1984
Before and rebase so its easier to review

Perf

Alot of perf wins (not universal for document mask ) but the delta from sol is much higher than what was found on hopper impl

Not autotuning the flex blocksparse impl gives this:

And autotuning the triton impl:

Possible problems

Looking at the Pm samples we can see a long tail:

For causal_mask with the default StaticPersistentSchedule. (We need to build a generic version of this) but we already have a better schedule for causal. If hard code the LPT schedule ![Uploading Screenshot 2025-11-04 at 5.32.58 PM.png…]()

we go from :

to:

Tests

flash_attn/cute/flash_fwd_sm100.py

flash_attn/cute/block_sparse_utils.py

drisspg · 2025-11-12T01:12:29Z

flash_attn/cute/block_sparse_utils.py

+
+
+@cute.jit
+def handle_block_sparse_empty_tile_correction_sm100(


had to dupe alot here but still think its better, having very large IF Else indents makes it harder to rebase / iterate on all the constexpr tree

- Implement block-sparse attention in flash_fwd_sm100.py - Update interface.py to handle SM100 block size calculations (2x multiplier for m_block_size since 1 CTA handles 2*tile_m rows) - Add mask_mod parameter support in mask.py for block-sparse masking - Add SM100 test fixtures and tile size handling in test_mask_mod.py This enables block-sparsity on SM 10.0 architecture, including mask_mod support and proper block size accounting.

drisspg · 2025-11-14T03:26:38Z

@tridao Okay, finally rebased, perf looks good
and tests are green

drisspg commented Nov 5, 2025

View reviewed changes

flash_attn/cute/flash_fwd_sm100.py Outdated Show resolved Hide resolved

drisspg commented Nov 5, 2025

View reviewed changes

flash_attn/cute/flash_fwd_sm100.py Outdated Show resolved Hide resolved

drisspg changed the title ~~[Cute] Extract block-sparse utilities from SM80/90~~ [Cute] Block sparse support Sm100 Nov 5, 2025

drisspg mentioned this pull request Nov 5, 2025

[FlexFlash] Blackwell fwd support pytorch/pytorch#167040

Open

drisspg force-pushed the sm100-block-sparsity branch 6 times, most recently from 06588af to 547cf51 Compare November 11, 2025 04:44

reubenconducts reviewed Nov 11, 2025

View reviewed changes

flash_attn/cute/block_sparse_utils.py Outdated Show resolved Hide resolved

drisspg force-pushed the sm100-block-sparsity branch from 547cf51 to 66cbbdb Compare November 11, 2025 05:29

drisspg commented Nov 12, 2025

View reviewed changes

drisspg force-pushed the sm100-block-sparsity branch from 66cbbdb to d1ece5e Compare November 12, 2025 01:22

drisspg mentioned this pull request Nov 12, 2025

[Cute] Splitkv PR cut perf by 2/3s #2005

Closed

tridao approved these changes Nov 12, 2025

View reviewed changes

drisspg force-pushed the sm100-block-sparsity branch from d1ece5e to 1db7911 Compare November 13, 2025 19:28

drisspg force-pushed the sm100-block-sparsity branch from 1db7911 to b01ba0c Compare November 14, 2025 02:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Cute] Block sparse support Sm100 #1985

[Cute] Block sparse support Sm100 #1985

drisspg commented Nov 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drisspg Nov 12, 2025

Uh oh!

drisspg commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@cute.jit
		def handle_block_sparse_empty_tile_correction_sm100(

[Cute] Block sparse support Sm100 #1985

Are you sure you want to change the base?

[Cute] Block sparse support Sm100 #1985

Conversation

drisspg commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fast follow

Perf

Possible problems

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drisspg Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

drisspg commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drisspg commented Nov 5, 2025 •

edited

Loading