[NVIDIA] Fix build xformers >= cu129 (torch 2.9.0) #1344

johnnynunez · 2025-10-09T10:20:36Z

What does this PR do?

Fixes #1320 #1308 #1323 #1335 and includes fixes for flash-attention >= CUDA 12.9 and adds cutlass v4.2.1 that fixes some kernels for Blackwell.
Also add support for Spark and Thor.
Added Blackwell family support. https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/

Thanks to: #1285 #1262 that are included here.

Fixes in flash-attention to support CUDA 13:

CUTLASS v4.2.1 Upgrade to cutlass v4.2.1 Dao-AILab/flash-attention#1905
C++11 fix warnings C++11 fix warnings Dao-AILab/flash-attention#1904
Blackwell family specific [NVIDIA] Enable Blackwell Family Specific Dao-AILab/flash-attention#1882
[BUILD] SBSA wheels + CUDA 13 Support [BUILD] SBSA wheels + CUDA 13 Support Dao-AILab/flash-attention#1865
[BUG] CUDA 13: make FA3 compatible with CUDA 13 Builds [BUG] CUDA 13: make FA3 compatible with CUDA 13 Builds Dao-AILab/flash-attention#1860
cutlass v4.3.0

Pytorch 2.9.0 https://dev-discuss.pytorch.org/t/pytorch-2-9-rc1-produced-for-pytorch-audio-vision/3234

cc @sgrigory

johnnynunez · 2025-10-15T13:11:29Z

@jiawenliu64 @bottler @sgrigory could you run and merge?

snakeeater4526 · 2025-10-21T18:26:57Z

just a little message to tell you that i believe that PR is needed hard for people on cuda 13, basically pytorch 2.9 is now the stable one, but the latest available xformers doesn't support cuda 13.

so some software ( like comfyui ) using tensor related stuff can't work properly.

ps: i'm not a dev at all, but i just tried for an entire day to use comfyui with TensorRT acceleration, and it's basically impossible with Cuda 13 drivers... ( did not managed to compile successfully this PR sadly )

johnnynunez · 2025-10-21T18:28:36Z

just a little message to tell you that i believe that PR is needed hard for people on cuda 13, basically pytorch 2.9 is now the stable one, but the latest available xformers doesn't support cuda 13.

so some software ( like comfyui ) using tensor related stuff can't work properly.

ps: i'm not a dev at all, but i just tried for an entire day to use comfyui with TensorRT acceleration, and it's basically impossible with Cuda 13 drivers... ( did not managed to compile successfully this PR sadly )

You have to export cccl to new ones. For me it is working.

johnnynunez · 2025-10-21T19:18:39Z

just a little message to tell you that i believe that PR is needed hard for people on cuda 13, basically pytorch 2.9 is now the stable one, but the latest available xformers doesn't support cuda 13.

so some software ( like comfyui ) using tensor related stuff can't work properly.

ps: i'm not a dev at all, but i just tried for an entire day to use comfyui with TensorRT acceleration, and it's basically impossible with Cuda 13 drivers... ( did not managed to compile successfully this PR sadly )

could you try again? it should be fixed

johnnynunez · 2025-10-27T23:32:57Z

ping you again
cc @jiawenliu64 @bottler @sgrigory

riomus · 2025-10-28T09:12:19Z

Note: When this branch is build inside of nvcr.io/nvidia/pytorch:25.09-py3 on DGX Spark - it is not working as TORCH_CUDA_ARCH_LIST in it has value "8.0 8.6 9.0 10.0 11.0 12.0+PTX" - +PTX probably breaks recognition of compute capabilities and build is executed for sm_80 and sm_90

to install it on DGX Spark from sources inside of recommended image (Nvidia recommends it instead of installing pytorch manually) unset TORCH_CUDA_ARCH_LIST or export TORCH_CUDA_ARCH_LIST=12.0 is needed

danthe3rd · 2025-10-28T12:18:52Z

Thanks for your PR! Let's check if the wheels build before merging - tests are running now :)

danthe3rd · 2025-10-28T12:30:47Z

Also probably we should update python to 3.10 in the CI so that the linter is able to run:

xformers/.github/workflows/linters_reusable.yml

Line 20 in 00a7a5f

python-version: '3.9'

johnnynunez · 2025-10-28T15:09:21Z

Also probably we should update python to 3.10 in the CI so that the linter is able to run:

xformers/.github/workflows/linters_reusable.yml

Line 20 in 00a7a5f

python-version: '3.9'

true, i miss it

johnnynunez · 2025-10-28T15:51:35Z

@danthe3rd i upgrade to minimum version 3.10 that is the minimum right now with torch 2.9.0

danthe3rd · 2025-10-28T17:24:25Z

Thanks! Can you also look at the windows build?

johnnynunez · 2025-10-28T17:30:53Z

Thanks! Can you also look at the windows build?

seems a bug in jimver action Jimver/cuda-toolkit#395

Switched from Jimver to N-Storm fork of cuda-toolkit.

danthe3rd · 2025-10-29T09:10:25Z

Thanks! Looks like we're getting these errors in the CI now (related to PyTorch's CUDAExtension?):

ValueError: Unknown CUDA arch (10.0f) or GPU not supported

The only supported archs in PyTorch are the following at the moment?

https://github.com/pytorch/pytorch/blob/8b188647cfdc1355070ccd5aaa18a8060d4f67bf/torch/utils/cpp_extension.py#L2435-L2438

johnnynunez · 2025-10-29T15:33:47Z

Thanks! Looks like we're getting these errors in the CI now (related to PyTorch's CUDAExtension?):

ValueError: Unknown CUDA arch (10.0f) or GPU not supported

The only supported archs in PyTorch are the following at the moment?

https://github.com/pytorch/pytorch/blob/8b188647cfdc1355070ccd5aaa18a8060d4f67bf/torch/utils/cpp_extension.py#L2435-L2438

Yes, i’ve seen that Blackwell Family, it is not compatible yet there. I’m going to change when i arrive at home(im come back from pytorch conference)
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/

danthe3rd · 2025-10-29T16:32:24Z

Thanks! Let's see how it goes. I'm a bit worried we might hit a timeout on the CI with that many architectures (especially for the build of FA3)

johnnynunez · 2025-10-29T19:31:16Z

Thanks! Let's see how it goes. I'm a bit worried we might hit a timeout on the CI with that many architectures (especially for the build of FA3)

maybe we can filter for FA3? FA3 is only compatible with 80 and 90, fa4 only still with 100/103

danthe3rd · 2025-10-29T21:31:00Z

Hum the FAv3 windows build for cuda 13 seems to be broken. Maybe we could disable FAv3 on windows/cuda13 for now?
I see this sort of errors:

2025-10-29T16:50:00.8970137Z C:\Users\runneradmin\AppData\Local\Temp\tmpxft_00002264_00000000-7_flash_bwd_hdim128_bf16_sm90.compute_90a.cudafe1.stub.c(417): error C2719: 'unnamed-parameter': formal parameter with requested alignment of 128 won't be aligned

https://github.com/facebookresearch/xformers/actions/runs/18913502491/job/53996546199?pr=1344

johnnynunez · 2025-10-29T22:49:23Z

Hum the FAv3 windows build for cuda 13 seems to be broken. Maybe we could disable FAv3 on windows/cuda13 for now? I see this sort of errors:

2025-10-29T16:50:00.8970137Z C:\Users\runneradmin\AppData\Local\Temp\tmpxft_00002264_00000000-7_flash_bwd_hdim128_bf16_sm90.compute_90a.cudafe1.stub.c(417): error C2719: 'unnamed-parameter': formal parameter with requested alignment of 128 won't be aligned

https://github.com/facebookresearch/xformers/actions/runs/18913502491/job/53996546199?pr=1344

Yes, maybe we can disable it. I don’t have windows right now to test it

danthe3rd · 2025-10-30T12:14:09Z

I would say we just need to set XFORMERS_DISABLE_FLASH_ATTN for now in the windows wheel build CI, or add a condition there to skip build on windows+cu13:

xformers/setup.py

Lines 282 to 283 in 51aa071

    
           if cuda_version < 1203: 
        
               return []

johnnynunez · 2025-10-30T12:28:45Z

I would say we just need to set XFORMERS_DISABLE_FLASH_ATTN for now in the windows wheel build CI, or add a condition there to skip build on windows+cu13:

xformers/setup.py

Lines 282 to 283 in 51aa071

if cuda_version < 1203:

return []

i tried to fix it, run CI if it works, if not, we can skip it

johnnynunez · 2025-10-30T13:41:11Z

@danthe3rd seems that fails again. I reported it internally. We can merge avoiding cu130 at this point

danthe3rd · 2025-10-30T13:52:05Z

Sure - let me merge if the CI is green :)
(will probably take ~6 hours tho)

setup.py

Due action runner in windows is not updates

johnnynunez · 2025-10-30T14:14:20Z

You can cancel the previous one. Now should works @danthe3rd

setup.py

Co-authored-by: dan_the_3rd <[email protected]>

danthe3rd · 2025-10-30T21:18:08Z

Everything seems alright, we can fix the linters later on our side.
Thanks a lot for your contribution!

BUILD CUDA 13

c0c3c34

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 9, 2025

Update action.yml

3bd4670

johnnynunez mentioned this pull request Oct 15, 2025

Build vLLM nightly wheels for CUDA 13.0 pytorch/pytorch#163239

Closed

Update action.yml

f0e2384

Update flash-attention hash

37eed3a

johnnynunez mentioned this pull request Oct 16, 2025

[BUG] Xformers not support NO-GIL #1345

Open

fix error cccl

db622e4

huydhn mentioned this pull request Oct 21, 2025

Update release pipeline for PyTorch 2.9.0 vllm-project/vllm#27303

Merged

johnnynunez and others added 3 commits October 21, 2025 16:47

Update requirements.txt

920ade3

Merge branch 'facebookresearch:main' into main

5beaf3f

fix error pytorch 2.9.0 in CI

c2407a6

johnnynunez and others added 3 commits October 28, 2025 08:11

Update linters_reusable.yml

3e2e11e

Update CUDA toolkit and Python versions in workflow

bfb2271

Update Python version from 3.9 to 3.10

d434cb3

johnnynunez added 2 commits October 28, 2025 08:55

upstream

ea44071

lint

dbe25a2

Add use-github-cache option to CUDA setup action

b819a23

johnnynunez added 2 commits October 28, 2025 18:53

Update cuda-toolkit action to use N-Storm fork

a62a9a9

Switched from Jimver to N-Storm fork of cuda-toolkit.

Fix CUDA architecture list format in setup.py

f732af6

danthe3rd mentioned this pull request Oct 29, 2025

Why is old pytorch version pinned in the wheel? #1350

Closed

Update TORCH_CUDA_ARCH_LIST for toolkit versioning

40872c0

johnnynunez added 2 commits October 30, 2025 13:27

try fix windows

6be38b9

Merge remote-tracking branch 'origin/main'

6ba993e

avoid compile fa3 windows with cu130

6f6e99e

Update CUDA version from 13.0.1 to 13.0.2

d5acb15

danthe3rd reviewed Oct 30, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

johnnynunez added 2 commits October 30, 2025 15:01

Modify CUDA version check for Windows platform

3b8fcb2

Update CUDA version from 13.0.2 to 13.0.1

7281508

Due action runner in windows is not updates

danthe3rd reviewed Oct 30, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

Update setup.py

fc6b421

Co-authored-by: dan_the_3rd <[email protected]>

danthe3rd merged commit a64b139 into facebookresearch:main Oct 30, 2025
17 of 20 checks passed

[NVIDIA] Fix build xformers >= cu129 (torch 2.9.0) #1344

[NVIDIA] Fix build xformers >= cu129 (torch 2.9.0) #1344

Uh oh!

Conversation

johnnynunez commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

johnnynunez commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snakeeater4526 commented Oct 21, 2025

Uh oh!

johnnynunez commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Oct 21, 2025

Uh oh!

johnnynunez commented Oct 27, 2025

Uh oh!

riomus commented Oct 28, 2025

Uh oh!

danthe3rd commented Oct 28, 2025

Uh oh!

danthe3rd commented Oct 28, 2025

Uh oh!

johnnynunez commented Oct 28, 2025

Uh oh!

johnnynunez commented Oct 28, 2025

Uh oh!

danthe3rd commented Oct 28, 2025

Uh oh!

johnnynunez commented Oct 28, 2025

Uh oh!

danthe3rd commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Oct 29, 2025

Uh oh!

danthe3rd commented Oct 29, 2025

Uh oh!

johnnynunez commented Oct 29, 2025

Uh oh!

danthe3rd commented Oct 29, 2025

Uh oh!

johnnynunez commented Oct 29, 2025

Uh oh!

danthe3rd commented Oct 30, 2025

Uh oh!

johnnynunez commented Oct 30, 2025

Uh oh!

johnnynunez commented Oct 30, 2025

Uh oh!

danthe3rd commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

johnnynunez commented Oct 30, 2025

Uh oh!

Uh oh!

danthe3rd commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johnnynunez commented Oct 9, 2025 •

edited

Loading

johnnynunez commented Oct 15, 2025 •

edited

Loading

johnnynunez commented Oct 21, 2025 •

edited

Loading

danthe3rd commented Oct 29, 2025 •

edited

Loading

danthe3rd commented Oct 30, 2025 •

edited

Loading