Skip to content

Releases: ROCm/rocBLAS

rocBLAS-2.24.0 for ROCm 3.6.0

11 Jul 00:38

Choose a tag to compare

New Features

  • Improvements to User Guide and Design Document
  • L1 dot function optimized to utilize shuffle instructions ( improvements on bf16, f16, f32 data types )
  • L1 dot function added x dot x optimized kernel
  • Standardization of L1 rocblas-bench to use device pointer mode to focus on GPU memory bandwidth
  • Adjustments for hipcc (hip-clang) compiler as standard build compiler and Centos8 support
  • Added Fortran interface for all rocBLAS functions

Known Issues

  • None

rocBLAS-2.2.0

28 Feb 22:11

Choose a tag to compare

Changelist:

  • Fix compilation of TRSV, IAMAX, IAMIN
  • Add TRSM test sizes
  • Fix false negative precision failures for f16_r gemm_ex tests
  • Improvements to documentation and addition of sample for i8_r/i32_r gemm_ex
  • Tuning for i8_r/i32_r gemm_ex for MIOpen
  • Add gtest ConfigurableEventListner to reduce Jenkins log file size
  • Initial refactorization of rocblas-bench
  • rocblas_dgemm NT tuning

rocBLAS-2.1.0

01 Feb 02:27

Choose a tag to compare

Changelist:

  • Refactor rocBLAS test framework
  • Improved performance of i8_r/i32_r rocblas_gemm_ex on gfx906
  • Addition of simple trsv implementation using trsm
  • Improved performance of trsm
  • Tuning improvements for resnet50 problems
  • Update tuning to use new Tensile solution selection logic
  • rocblas_gemm_ex performance improvement when ldd == lcc and strideD == strideC
  • Bug fixes for IAMIN and TRSV
  • Add sphinx based readthedoc documentation

rocBLAS-2.0.0 for ROCm 2.0

19 Dec 19:46

Choose a tag to compare

Changelist:

  • improved performance of fp16/fp32 rocblas_gemm_ex on gfx906
  • support for i8/i32 rocblas_gemm_ex
  • update vega-10 resnet50 tuning
  • refactor testing to be data driven
  • change gemm-ex API solution index from uint32_t to int32_t
  • disable gemm and gemm_ex chunking
  • fix gemv argument checking
  • add performance script for p1b1 benchmark sizes
  • refactor gemm code to reduce use of macros
  • trsm performance regression fix

rocBLAS-14.3.0 for ROCm1.9

12 Oct 03:00

Choose a tag to compare

Changelist:

  • add rocblas_gemm_strided_batched_ex for mixed precision support
  • tested on ROCm1.9
  • fix chunking of A and B matrices
  • expand testing of rocblas_gemm
  • sgemm and hgemm tuning on gfx906 for Resnet50 from Tensile V4.6.0

Known failures:

  • known dgemm failures for m,n < 16

enable gfx906 support

21 Sep 17:44
8490ca9

Choose a tag to compare

A small incremental release to enable gfx906 support. To get gfx906 support, ROCm 1.9 or later must be used to build rocBLAS.

rocBLAS-14.1.2 for ROCm1.8.2

12 Sep 15:56

Choose a tag to compare

Changelist:

  • Add initial rocblas_gemm_ex for mixed precision support and foundation for future capabilities
  • use Tensile 4.5.0 for bug fixes and performance improvements
  • separate tests into quick, pre_checkin, and nightly
  • add sweep tests for gemm

rocBLAS 14.1.1 for ROCm 1.8.2

10 Aug 04:02

Choose a tag to compare

Changelist:

  • update hgemm asm_full YAML file for performance; re-train hgemm hip_lite YAML file
  • new YAML files with PreciseBoundsCheck disabled
  • update hgemm asm_full YAML file, source and VW=2 for m,n,k <= 32
  • update hgemm asm_full YAML file, source and VW=1 for m,n,k == 1
  • add strided_batched tests for hgemm
  • correct gemm test matrix initialization
  • change cmake and source files to support hip-clang
  • change from __fp16 to _Float16

rocBLAS 14.1.0 for ROCm1.8.2

29 Jun 15:33

Choose a tag to compare

Changelist:

  • partition gemm m and n dimension to avoid offset exceeding 32 bit
  • fix set_get_matrix memory leak
  • TRSM improved performance and make asynch
  • Use hip_device target for ROCm1.8.2
  • Improve gemm-strided-batched testing

rocBLAS-14.0.0 for ROCm1.7.1

15 May 23:05

Choose a tag to compare

Changelist:

  • fix Xtrsm for large size ldb
  • fix set_get_matrix for large size
  • fix Xgemm test for large size
  • additional training for ResNet sizes
  • fix dot, asum, nrm2