Skip to content

Release SuperBench v0.12.0

Latest
Compare
Choose a tag to compare
@polarG polarG released this 11 Aug 21:58
· 0 commits to release/0.12 since this release

SuperBench 0.12.0 Release Notes

SuperBench Improvements

  • Optimized cutlass build process for faster builds and smaller binaries.
  • Improve image build pipeline.
  • Add support for arm64 builds.
  • Upgrade pipeline dependencies.
  • Fix SuperBench installation and code lint issues.
  • Update Flake8 repository.
  • Add support for the latest Python versions.
  • Enhance error handling for pkg_resources imports.
  • Update ROCm image build labels.
  • Add CUDA 12.8 and CUDA 12.9 support.
  • Consolidate multi-architecture Docker images.
  • Upgrade runner OS to latest version.
  • Fix typos in documentation and code.

Micro-benchmark Improvements

  • Add general CPU bandwidth and latency benchmarks.
  • Add nvbandwidth build process and benchmarks.
  • Add architecture support for 10.0 in gemm-flops.
  • Add GPU Stream micro benchmark.
  • Add FP4 GEMM FLOPS support in cublaslt_gemm benchmark.
  • Add Grace CPU support for CPU Stream benchmark.
  • Revise CPU Stream benchmark.
  • Fix NUMA error on Grace CPU in gpu-copy benchmark.
  • Bump onnxruntime-gpu dependency from 1.10.0 to 1.12.0.
  • Fix stderr message in gpu-copy benchmark.
  • Fix TensorRT inference parsing.
  • Handle N/A values in nvbandwidth benchmark.
  • Avoid unintended nvbandwidth function calls in all benchmarks.
  • Support CUDA arch flag and autotuning in cublaslt GEMM.

Model-benchmark Improvements

  • Add LLaMA-2 model benchmarks.
  • Add Mixture of Experts model benchmarks.
  • Add DeepSeek inference benchmark (AMD GPU).

Result Analysis

  • Enhance logging for diagnosis rule baseline errors.

Documentation Updates

  • Update CODEOWNERS file.