Skip to content

v1.7.0

Choose a tag to compare

@vshampor vshampor released this 19 Apr 12:29
· 23 commits to master since this release

New features:

  • Adjust Padding feature to support accurate execution of U4 on VPU - when setting "target_device" to "VPU", the training-time padding values for quantized convolutions will be adjusted to better reflect VPU inference process.
  • Weighted layers that are "frozen" (i.e. have requires_grad set to False at compressed model creation time) are no longer considered for compression, to better handle transfer learning cases.
  • Quantization algorithm now sets up quantizers without giving an option for requantization, which guarantees best performance, although at some cost to quantizer configuration flexibility.
  • Pruning models with FCOS detection heads and instance normalization operations now supported
  • Added a mean percentile initializer for the quantization algorithm
  • Now possible to additionally quantize model outputs (separate control for each output quantization is supported)
  • Models quantized for CPU now use effective 7-bit quantization for weights - the ONNX-exported model is still configured to use 8 bits for quantization, but only the middle 128 quanta of the total possible 256 are actually used, which allows for better OpenVINO inference accuracy alignment with PyTorch on non-VNNI CPUs
  • Bumped target PyTorch version to 1.8.1 and relaxed package requirements constraints to allow installation into environments with PyTorch >=1.5.0

Notable bugfixes:

  • Fixed bias pruning in depthwise convolution
  • Made per-tensor quantization available for all operations that support per-channel quantization
  • Fixed progressive training performance degradation when an output tensor of an NNCF-compressed model is reused as its input.
  • pip install . path of installing NNCF from a checked-out repository is now supported.
  • Nested with no_nncf_trace() blocks now function as expected.
  • NNCF compression API now formally abstract to guard against virtual function calls
  • Now possible to load AutoQ and HAWQ-produced checkpoints to evaluate them or export to ONNX

Removed features:

  • Pattern-based quantizer setup mode for quantization algorithm - due to its logic, it did not guarantee that all required operation inputs are ultimately quantized.