Skip to content

Releases: facebookresearch/xformers

v0.0.33.post1

13 Nov 14:25

Choose a tag to compare

Fixed wheel upload to pypi

Support Pytorch 2.9

12 Nov 13:49

Choose a tag to compare

Added

  • cutlass fmha Op for Blackwell GPUs
  • Support flash-attention package up to 2.8.3
  • expose FA3 deterministic mode
  • FW+BW pass overlap for DeepSeek-like comms/compute overlap

Improved

  • merge_attentions support for irregular head dimension

v0.0.32.post2

15 Aug 05:58
5d4b92a

Choose a tag to compare

Add ROCM 6.4 build

v0.0.32.post1

14 Aug 12:13
840bcec

Choose a tag to compare

wheels/windows timeout (#1309)

* wheels/windows timeout

Try building with `MAX_JOBS=3`

* Update wheels_build.yml

v0.0.32: Wheels for PyTorch 2.8.0

13 Aug 19:29

Choose a tag to compare

Pre-built binary wheels are available for PyTorch 2.8.0.

Added

  • Support flash-attention package up to 2.8.2
  • Speed improvements to python -m xformers.profiler.find_slowest

Removed

  • Removed autograd backward pass for merge_attentions as it is easy to use incorrectly.
  • Attention biases are no longer torch.Tensor subclasses. This is no longer
    necessary for torch.compile to work, and was adding more complexity

`v0.0.31.post1` Fixing wheels for windows

08 Jul 09:36

Choose a tag to compare

remove merge_attentions backward (fairinternal/xformers#1402)

__original_commit__ = fairinternal/xformers@601197af8bf5a55f73b4bb79b5d74a03b853dc51

v0.0.31 - PyTorch 2.7.1, Flash3 on windows, and dropping V100 support

25 Jun 09:11

Choose a tag to compare

[0.0.31] - 2025-06-25

Pre-built binary wheels are available for PyTorch 2.7.1.

Added

  • xFormers wheels are now python-version agnostic: this means that the same wheel can be used for python 3.9, 3.10, ... 3.13
  • Added support for Flash-Attention 3 on Ampere GPUs

Removed

  • We will no longer support V100 or older GPUs, following PyTorch (pytorch/pytorch#147607)
  • Deprecated support for building Flash-Attention 2 as part of xFormers. For Ampere GPUs, we now use Flash-Attention 3 on windows, and Flash-Attention 2 can still be used through PyTorch on linux.

`v0.0.30` - build for PyTorch 2.7.0

28 Apr 14:51
@lw lw

Choose a tag to compare

Pre-built binary wheels are available for PyTorch 2.7.0. Following PyTorch, we build wheels for CUDA 11.8, 12.6, and 12.8 only (we no longer build for CUDA 12.4).
xFormers now requires PyTorch >= 2.7

Added

  • [fMHA] Added support for local attention on the Flash3 backend (H100)
  • [fMHA] Added a new paged gappy attention bias

Improved

  • [fMHA] The FlashAttention3 backend now ships with more head dimensions to support MLA, and with a FLOPs formula in order to be compatible with PyTorch's partitioner-base automatic activation checkpointing
  • The fused operators for sequence parallelism were migrated to PyTorch's SymmetricMemory
  • The profiler prepends the traces' filenames with the rank of the process when doing distributed training

Removed

  • Removed documentation for legacy unmaintained components

`v0.0.29.post3` Fix CUDA 12.6 builds on Windows

10 Feb 12:43
@lw lw
0b3963a

Choose a tag to compare

Fix missing builds for CUDA 12.6 on Windows

`v0.0.29.post2` - build for PyTorch 2.6.0

31 Jan 20:42
1298453

Choose a tag to compare

Pre-built binary wheels are available for PyTorch 2.6.0. Following PyTorch, we build wheels for CUDA 11.8, 12.4, and 12.6 only (we no longer build for CUDA 12.1).
xFormers now requires PyTorch >= 2.6