Releases
v0.12.0
Release SuperBench v0.12.0
Latest
polarG
released this
11 Aug 21:58
·
0 commits
to release/0.12
since this release
SuperBench 0.12.0 Release Notes
SuperBench Improvements
Optimized cutlass build process for faster builds and smaller binaries.
Improve image build pipeline.
Add support for arm64 builds.
Upgrade pipeline dependencies.
Fix SuperBench installation and code lint issues.
Update Flake8 repository.
Add support for the latest Python versions.
Enhance error handling for pkg_resources
imports.
Update ROCm image build labels.
Add CUDA 12.8 and CUDA 12.9 support.
Consolidate multi-architecture Docker images.
Upgrade runner OS to latest version.
Fix typos in documentation and code.
Micro-benchmark Improvements
Add general CPU bandwidth and latency benchmarks.
Add nvbandwidth build process and benchmarks.
Add architecture support for 10.0 in gemm-flops.
Add GPU Stream micro benchmark.
Add FP4 GEMM FLOPS support in cublaslt_gemm
benchmark.
Add Grace CPU support for CPU Stream benchmark.
Revise CPU Stream benchmark.
Fix NUMA error on Grace CPU in gpu-copy benchmark.
Bump onnxruntime-gpu dependency from 1.10.0 to 1.12.0.
Fix stderr message in gpu-copy benchmark.
Fix TensorRT inference parsing.
Handle N/A values in nvbandwidth benchmark.
Avoid unintended nvbandwidth function calls in all benchmarks.
Support CUDA arch flag and autotuning in cublaslt
GEMM.
Model-benchmark Improvements
Add LLaMA-2 model benchmarks.
Add Mixture of Experts model benchmarks.
Add DeepSeek inference benchmark (AMD GPU).
Result Analysis
Enhance logging for diagnosis rule baseline errors.
Documentation Updates
You can’t perform that action at this time.