Skip to content

Releases: ROCm/rocFFT

rocFFT 1.0.17 for ROCm 5.2.1

21 Jul 20:24

Choose a tag to compare

rocFFT code for ROCm 5.2.1 did not change. The library was rebuilt for the updated ROCm 5.2.1 stack.

rocFFT 1.0.17 for ROCm 5.2.0

28 Jun 18:45

Choose a tag to compare

Added

  • Packages for test and benchmark executables on all supported OSes using CPack.
  • Added File/Folder Reorg Changes with backward compatibility support using ROCM-CMAKE wrapper functions.

Changed

  • Improved reuse of twiddle memory between plans.
  • Set a default load/store callback when only one callback
    type is set via the API for improved performance.

Optimizations

  • Introduced a new access pattern of lds (non-linear) and applied it on
    sbcc kernels len 64 to get performance improvement.

Fixed

  • Fixed plan creation failure in cases where SBCC kernels would need to write to non-unit-stride buffers.

rocFFT 1.0.16 for ROCm 5.1.3

20 May 17:05
15ac7c4

Choose a tag to compare

rocFFT code for ROCm 5.1.3 did not change. The library was rebuilt for the updated ROCm 5.1.3 stack.

rocFFT 1.0.16 for ROCm 5.1.1

08 Apr 20:53
15ac7c4

Choose a tag to compare

rocFFT code for ROCm 5.1.1 did not change. The library was rebuilt for the updated ROCm 5.1.1 stack.

rocFFT 1.0.16 for ROCm 5.1.0

30 Mar 17:30
15ac7c4

Choose a tag to compare

Changed

  • Supported unaligned tile dimension for SBRC_2D kernels.
  • Improved (more RAII) test and benchmark infrastructure.
  • Enabled runtime compilation of length-2304 FFT kernel during plan creation.

Optimizations

  • Optimized more large 1D cases by using L1D_CC plan.
  • Optimized 3D 200^3 C2R case.
  • Optimized 1D 2^30 double precision on MI200.

Fixed

  • Fixed correctness of some R2C transforms with unusual strides.

Removed

  • The hipFFT API (header) has been removed from after a long deprecation period. Please use the hipFFT package/repository to obtain the hipFFT API.

rocFFT 1.0.15 for ROCm 5.0.2

04 Mar 17:54
fb0d3f8

Choose a tag to compare

rocFFT code for ROCm 5.0.2 is unchanged from rocFFT for ROCm 5.0.1. The library was rebuilt for the updated ROCm 5.0.2 stack.

rocFFT 1.0.15 for ROCm 5.0.1

16 Feb 22:20
fb0d3f8

Choose a tag to compare

rocFFT code for ROCm 5.0.1 is unchanged from rocFFT for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.

rocFFT 1.0.15 for ROCm 5.0.0

09 Feb 21:45
fb0d3f8

Choose a tag to compare

Changed

  • Re-aligned split device library into 4 roughly equal libraries.
  • Implemented the FuseShim framework to replace the original OptimizePlan
  • Implemented the generic buffer-assignment framework. The buffer assignment
    is no longer performed by each node. We designed a generic algorithm to
    test and pick the best assignment path.
    With the help of FuseShim, we can achieve more kernel-fusions as possible.
  • Do not read the imaginary part of the DC and Nyquist modes for even-length
    complex-to-real transforms.

Optimizations

  • Optimized twiddle-conjugation; complex-to-complex inverse transforms should have similar performance to foward transforms now.
  • Improved performance of single-kernel small 2D transforms.

rocFFT 1.0.14 for ROCm 4.5.2

10 Dec 19:28
b021fc3

Choose a tag to compare

rocFFT code for ROCm 4.5.2 is unchanged from rocFFT for ROCm 4.5.0. The library was rebuilt for the updated ROCm 4.5.2 stack.

rocFFT 1.0.14 for ROCm 4.5.0

27 Oct 21:52
b021fc3

Choose a tag to compare

Changed

  • Packaging split into a runtime package called rocfft and a development package called rocfft-devel. The development package depends on runtime. The runtime package suggests the development package for all supported OSes except CentOS 7 to aid in the transition. The suggests feature in packaging is introduced as a deprecated feature and will be removed in a future rocm release.

Optimizations

  • Optimized SBCC kernels of length 52, 60, 72, 80, 84, 96, 104, 108, 112, 160,
    168, 208, 216, 224, 240 with new kernel generator.
  • Improved many plans by removing unnecessary transpose steps.
  • Optimized scheme selection for 3D problems.
    • Imposed less restrictions on 3D_BLOCK_RC selection. More problems can use 3D_BLOCK_RC and
      have some performance gain.
    • Enabled 3D_RC. Some 3D problems with SBCC-supported z-dim can use less kernels and get benefit.
    • Force --length 336 336 56 (dp) use faster 3D_RC to avoid it from being skipped by conservative
      threshold test.
  • Optimized some even-length R2C/C2R cases by doing more operations
    in-place and combining pre/post processing into Stockham kernels.
  • Added radix-17.

Fixed

  • Fixed a few validation failures of even-length R2C inplace. 2D, 3D cubics sizes such as
    100^2 (or ^3), 200^2 (or ^3), 256^2 (or ^3)...etc. We don't combine the three kernels
    (stockham-r2c-transpose). We only combine two kernels (r2c-transpose) instead.
  • Improved large 1D transform decompositions.

Added

  • Added support for Windows 10 as a build target.
  • Added new kernel generator for select fused-2D transforms.