v2.5.0
·
2772 commits
to develop
since this release
Post-training Quantization:
Features:
- Official release of OpenVINO framework support.
- Ported NNCF OpenVINO backend to use the nGraph representation of OpenVINO models.
- Changed dependecies of NNCF OpenVINO backend. It now depends on
openvinopackage and not on theopenvino-devpackage. - Added GRU/LSTM quantization support.
- Added quantizer scales unification.
- Added support for models with 3D and 5D Depthwise convolution.
- Added FP16 OpenVINO models support.
- Added
"overflow_fix"parameter (forquantize(...)&quantize_with_accuracy_control(...)methods) support & functionality. It improves accuracy for optimized model for affected devices. More details in Quantization section. - (OpenVINO) Added support for in-place statistics collection (reduce memory footprint during optimization).
- (OpenVINO) Added Quantization with accuracy control algorithm.
- (OpenVINO) Added YOLOv8 examples for
quantize(...)&quantize_with_accuracy_control(...)methods. - (PyTorch) Added min-max quantization algorithm as experimental.
Fixes:
- Fixed
ignored_scopeattribute behaviour for weights. Now, the weighted layers excludes from optimization scope correctly. - (ONNX) Checking correct ONNX opset version via the
nncf.quantize(...). Now, models with opset < 13 are optimized correctly in per-tensor quantization.
Improvements:
- Added improvements for statistic collection process (collect weights statistics only once).
- (PyTorch, OpenVINO, ONNX) Introduced unified quantizer parameters calculation.
Known issues:
quantize(...)method can generate inaccurate int8 results for models with the DenseNet-like architecture. Usequantize_with_accuracy_control(...)in such case.quantize(...)method can hang on models with transformer architecture whenfast_bias_correctionoptional parameter is set to False. Don't set it to False or usequantize_with_accuracy_control(...)in such case.quantize(...)method can generate inaccurate int8 results for models with the MobileNet-like architecture on non-VNNI machines.
Compression-aware training:
New Features:
- Introduced automated structured pruning algorithm for JPQD with support for BERT, Wave2VecV2, Swin, ViT, DistilBERT, CLIP, and MobileBERT models.
- Added
nncf.common.utils.patcher.Patcher- this class can be used to patch methods on live PyTorch model objects with wrappers such asnncf.torch.dynamic_graph.context.no_nncf_tracewhen doing so in the model code is not possible (e.g. if the model comes from an external library package). - Compression controllers of the
nncf.api.compression.CompressionAlgorithmControllerclass now have a.strip()method that will return the compressed model object with as many custom NNCF additions removed as possible while preserving the functioning of the model object as a compressed model.
Fixes:
- Fixed statistics computation for pruned layers.
- (PyTorch) Fixed traced tensors to implement the YOLOv8 from Ultralytics.
Improvements:
- Extension of attributes (
transpose/permute/getitem) for pruning node selector. - NNCFNetwork was refactored from a wrapper-approach to a mixin-like approach.
- Added average pool 3d-like ops to pruning mask.
- Added Conv3d for overflow fix.
nncf.set_log_file(...)can now be used to set location of the NNCF log file.- (PyTorch) Added support for pruning of
torch.nn.functional.padoperation. - (PyTorch) Added
torch.baddbmmas an alias for the matmul metatype for quantization purposes. - (PyTorch) Added config file for ResNet18 accuracy-aware pruning + quantization on CIFAR10.
- (PyTorch) Fixed JIT-traceable PyTorch models with internal patching.
- (PyTorch) Added
__matmul__magic functions to the list of patched ops (for SwinTransformer by Microsoft).
Requirements:
- Updated ONNX version (1.13)
- Updated Tensorflow version (2.11)
General changes:
- Added Windows support for NNCF.