Release SuperBench v0.12.0

Latest

Latest

polarG released this 11 Aug 21:58

· 0 commits to release/0.12 since this release

fdb800d

SuperBench 0.12.0 Release Notes

SuperBench Improvements

Optimized cutlass build process for faster builds and smaller binaries.
Improve image build pipeline.
Add support for arm64 builds.
Upgrade pipeline dependencies.
Fix SuperBench installation and code lint issues.
Update Flake8 repository.
Add support for the latest Python versions.
Enhance error handling for pkg_resources imports.
Update ROCm image build labels.
Add CUDA 12.8 and CUDA 12.9 support.
Consolidate multi-architecture Docker images.
Upgrade runner OS to latest version.
Fix typos in documentation and code.

Micro-benchmark Improvements

Add general CPU bandwidth and latency benchmarks.
Add nvbandwidth build process and benchmarks.
Add architecture support for 10.0 in gemm-flops.
Add GPU Stream micro benchmark.
Add FP4 GEMM FLOPS support in cublaslt_gemm benchmark.
Add Grace CPU support for CPU Stream benchmark.
Revise CPU Stream benchmark.
Fix NUMA error on Grace CPU in gpu-copy benchmark.
Bump onnxruntime-gpu dependency from 1.10.0 to 1.12.0.
Fix stderr message in gpu-copy benchmark.
Fix TensorRT inference parsing.
Handle N/A values in nvbandwidth benchmark.
Avoid unintended nvbandwidth function calls in all benchmarks.
Support CUDA arch flag and autotuning in cublaslt GEMM.

Model-benchmark Improvements

Add LLaMA-2 model benchmarks.
Add Mixture of Experts model benchmarks.
Add DeepSeek inference benchmark (AMD GPU).

Result Analysis

Enhance logging for diagnosis rule baseline errors.

Documentation Updates

Update CODEOWNERS file.

Assets 2