Releases: amd/ZenDNN-pytorch-plugin
zentorch Release v5.1
zentorch 5.1 is the PyTorch plugin which comes with the ZenDNN 5.1 release. ZenDNN 5.1 continues to deliver inference performance for deep learning models on AMD EPYC™ CPUs, with a focus on optimizing large scale Recommender Systems. We've introduced several key optimizations to boost the performance of recommender models, such as DLRMv2.
The zentorch 5.1 plugin is optimized to be compatible with PyTorch 2.7 . In addition, zentorch is also integrated with vLLM. The new zentorch plugin for vLLM delivers a significant performance uplift of up to 21% on a variety of models compared to vLLM-IPEX.
zentorch Release v5.0.2
zentorch 5.0.2 is the PyTorch plugin which comes with ZenDNN 5.0.2.
ZenDNN 5.0.2 is a minor release building upon the major ZenDNN 5.0 release. This upgrade continues the focus on optimizing inference with Recommender Systems and Large Language Models on AMD EPYC™ CPUs. includes AMD EPYC™ enhancements for bfloat16 performance, expanded support for cutting-edge models like Llama 3.1 and 3.2, Microsoft Phi, and more as well as support for INT4 quantized datatype. This includes the advanced Activation-Aware Weight Quantization (AWQ) algorithm for LLMs and quantized support for the DLRM-v2 model with int8 weights.
Under the hood, ZenDNN’s enhanced AMD-specific optimizations operate at every level. In addition to highly optimized operator microkernels, these include comprehensive graph optimizations including pattern identification, graph reordering, and fusions. They also incorporate optimized embedding bag kernels and enhanced zenMatMul matrix splitting strategies which leverage the AMD EPYC™ microarchitecture to deliver enhanced throughput and latency.
Combined with PyTorch's torch.compile, zentorch transforms deep learning pipelines into finely-tuned, AMD-specific engines, delivering unparalleled efficiency and speed for large-scale inference workloads
The zentorch 5.0.2 release plugs seamlessly with PyTorch versions from 2.6 to 2.2, offering a high-performance experience for deep learning on AMD EPYC™ platforms.
This release adds optimizations for INT8/INT4-quantized DLRM models unlocking faster inference with lower memory usage compared to BF16-precision.
zentorch Release v5.0.1
zentorch 5.0.1 is the PyTorch plugin which comes with ZenDNN 5.0.1.
ZenDNN 5.0.1 is a minor release building upon the major ZenDNN 5.0 release. This upgrade continues the focus on optimizing inference with Recommender Systems and Large Language Models on AMD EPYC™ CPUs. includes AMD EPYC™ enhancements for bfloat16 performance, expanded support for cutting-edge models like Llama 3.1 and 3.2, Microsoft Phi, and more as well as support for INT4 quantized datatype. This includes the advanced Activation-Aware Weight Quantization (AWQ) algorithm for LLMs and quantized support for the DLRM-v2 model with int8 weights.
Under the hood, ZenDNN’s enhanced AMD-specific optimizations operate at every level. In addition to highly optimized operator microkernels, these include comprehensive graph optimizations including pattern identification, graph reordering, and fusions. They also incorporate optimized embedding bag kernels and enhanced zenMatMul matrix splitting strategies which leverage the AMD EPYC™ microarchitecture to deliver enhanced throughput and latency.
Combined with PyTorch's torch.compile, zentorch transforms deep learning pipelines into finely-tuned, AMD-specific engines, delivering unparalleled efficiency and speed for large-scale inference workloads
The zentorch 5.0.1 release plugs seamlessly with PyTorch versions from 2.5 to 2.2, offering a high-performance experience for deep learning on AMD EPYC™ platforms.
zentorch Release v5.0
zentorch is compatible with base versions of PyTorch v2.0 or later. This release provides zentorch for PyTorch v2.4.0.
This release of the plug-in supports:
- Datatypes FP32, BF16, and INT4 (WOQ)
- Introduction of a new zentorch.llm.optimize() method for Hugging Face Generative LLM models
- New zentorch.load_woq_model() method to support loading of Weight Only Quantized models generated through the AMD Quark tool. This method only supports models quantized and exported with per-channel quantization using the AWQ algorithm.
- Improved graph optimizations, enhanced SDPA (Scalar Dot Product Attention) operator and more.
- Automatic Mixed Precision (AMP) between FP32 and BF16 providing a performance improvement with minimal changes in accuracy
zentorch Release v4.2
This zentorch release:
- Is compatible with PyTorch v2.0 and later
- Extends Pytorch by providing a custom backend to torch.compile flow to provide a performant AI inference solution for AMD EPYCTM servers leveraging the ZenDNN 4.2 library
- Includes graph optimizations and fusions tailored for AMD EPYCTM architectures
- Supports BF16 execution through auto-mixed precision to provide performance improvements with minimal changes in accuracy