Releases: openvinotoolkit/nncf
Releases · openvinotoolkit/nncf
v2.10.0
Post-training Quantization:
Features:
- Introduced the subgraph defining functionality for the nncf.IgnoredScope() option.
- Introduced limited support for the batch size of more than 1. MobilenetV2 PyTorch example was updated with batch support.
Fixes:
- Fixed issue with the nncf.OverflowFix parameter absence in some scenarios.
- Aligned the list of correctable layers for the FastBiasCorrection algorithm between PyTorch, OpenVINO and ONNX backends.
- Fixed issue with the nncf.QuantizationMode parameters combination.
- Fixed MobilenetV2 (PyTorch, ONNX, OpenVINO) examples for the Windows platform.
- (OpenVINO) Fixed Anomaly Classification example for the Windows platform.
- (PyTorch) Fixed bias shift magnitude calculation for fused layers.
- (OpenVINO) Fixed removing the ShapeOf graph which led to an error in the nncf.quantize_with_accuracy_control() method.
- Improvements:
- OverflowFix, AdvancedSmoothQuantParameters and AdvancedBiasCorrectionParameters were exposed into the nncf.* namespace.
- (OpenVINO, PyTorch) Introduced scale compression to FP16 for weights in nncf.compress_weights() method, regardless of model weights precision.
- (PyTorch) Modules that NNCF inserted were excluded from parameter tracing.
- (OpenVINO) Extended the list of correctable layers for the BiasCorrection algorithm.
- (ONNX) Aligned BiasCorrection algorithm behaviour with OpenVINO in specific cases.
Tutorials:
- Post-Training Optimization of PhotoMaker Model
- Post-Training Optimization of Stable Diffusion XL Model
- Post-Training Optimization of KerasCV Stable Diffusion Model
- Post-Training Optimization of Paint By Example Model
- Post-Training Optimization of aMUSEd Model
- Post-Training Optimization of InstantID Model
- Post-Training Optimization of LLaVA Next Model
- Post-Training Optimization of AnimateAnyone Model
- Post-Training Optimization of YOLOv8-OBB Model
- Post-Training Optimization of LLM Agent
Compression-aware training:
Features:
- (PyTorch) nncf.quantize method now may be used as quantization initialization for Quantization-Aware Training. Added a Resnet18-based example with the transition from the Post-Training Quantization to a Quantization-Aware Training algorithm.
- (PyTorch) Introduced extractors for the fused Convolution, Batch-/GroupNorm, and Linear functions.
Fixes:
- (PyTorch) Fixed apply_args_defaults function issue.
- (PyTorch) Fixed dtype handling for the compressed torch.nn.Parameter.
- (PyTorch) Fixed is_shared parameter propagation.
Improvements:
- (PyTorch) Updated command creation behaviour to reduce the number of adapters.
- (PyTorch) Added option to insert point for models that wrapped with replace_modules=False.
Deprecations/Removals:
- (PyTorch) Removed the binarization algorithm.
- NNCF installation via pip install nncf[] option is now deprecated.
Requirements:
- Updated PyTorch (2.2.1) and CUDA (12.1) versions.
- Updated ONNX (1.16.0) and ONNXRuntime (1.17.1) versions.
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@Candyzorua
@clinty
@UsingtcNower
@DaniAffCH
v2.9.0
Post-training Quantization:
Features:
- (OpenVINO) Added modified AWQ algorithm for 4-bit data-aware weights compression. This algorithm applied only for patterns
MatMul->Multiply->Matmul. For thatawqoptional parameter has been added tonncf.compress_weights()and can be used to minimize accuracy degradation of compressed models (note that this option increases the compression time). - (ONNX) Introduced support for the ONNX backend in the
nncf.quantize_with_accuracy_control()method. Users can now perform quantization with accuracy control foronnx.ModelProto. By leveraging this feature, users can enhance the accuracy of quantized models while minimizing performance impact. - (ONNX) Added an example based on the YOLOv8n-seg model for demonstrating the usage of quantization with accuracy control for the ONNX backend.
- (PT) Added SmoothQuant algorithm for PyTorch backend in
nncf.quantize(). - (OpenVINO) Added an example with the hyperparameters tuning for the TinyLLama model.
- Introduced the
nncf.AdvancedAccuracyRestorerParameters. - Introduced the
subset_sizeoption for thenncf.compress_weights(). - Introduced
TargetDevice.NPUas the replacement forTargetDevice.VPU.
Fixes:
- Fixed API Enums serialization/deserialization issue.
- Fixed issue with required arguments for
revert_operations_to_floating_point_precisionmethod.
Improvements:
- (ONNX) Aligned statistics collection with OpenVINO and PyTorch backends.
- Extended
nncf.compress_weights()with Convolution & Embeddings compression in order to reduce memory footprint.
Deprecations/Removals:
- (OpenVINO) Removed outdated examples with
nncf.quantize()for BERT and YOLOv5 models. - (OpenVINO) Removed outdated example with
nncf.quantize_with_accuracy_control()for SSD MobileNetV1 FPN model. - (PyTorch) Deprecated the
binarizationalgorithm. - Removed Post-training Optimization Tool as OpenVINO backend.
- Removed Dockerfiles.
TargetDevice.VPUwas replaced byTargetDevice.NPU.
Tutorials:
- Post-Training Optimization of Stable Diffusion v2 Model
- Post-Training Optimization of DeciDiffusion Model
- Post-Training Optimization of DepthAnything Model
- Post-Training Optimization of Stable Diffusion ControlNet Model
Compression-aware training:
Fixes
- (PyTorch) Fixed issue with
NNCFNetworkInterface.get_clean_shallow_copymissed arguments.
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@AishwaryaDekhane
@UsingtcNower
@Om-Doiphode
v2.8.1
Post-training Quantization:
Bugfixes:
- (Common) Fixed issue with
nncf.compress_weights()to avoid overflows on 32-bit Windows systems. - (Common) Fixed performance issue with
nncf.compress_weights()on LLama models. - (Common) Fixed
nncf.quantize_with_accuracy_controlpipeline withtune_hyperparams=Trueenabled option. - (OpenVINO) Fixed issue for stateful LLM models and added state restoring after the inference for it.
- (PyTorch) Fixed issue with
nncf.compress_weights()for LLM models with the executingis_floating_pointwith tracing.
v2.8.0
Post-training Quantization:
Breaking changes:
nncf.quantizesignature has been changed to addmode: Optional[nncf.QuantizationMode] = Noneas its 3-rd argument, between the originalcalibration_datasetandpresetarguments.- (Common)
nncf.common.quantization.structs.QuantizationModehas been renamed tonncf.common.quantization.structs.QuantizationScheme
General:
- (OpenVINO) Changed default OpenVINO opset from 9 to 13.
Features:
- (OpenVINO) Added 4-bit data-aware weights compression. For that
datasetoptional parameter has been added tonncf.compress_weights()and can be used to minimize accuracy degradation of compressed models (note that this option increases the compression time). - (PyTorch) Added support for PyTorch models with shared weights and custom PyTorch modules in
nncf.compress_weights(). The weights compression algorithm for PyTorch models is now based on tracing the model graph. Thedatasetparameter is now required innncf.compress_weights()for the compression of PyTorch models. - (Common) Renamed the
nncf.CompressWeightsMode.INT8tonncf.CompressWeightsMode.INT8_ASYMand introducenncf.CompressWeightsMode.INT8_SYMthat can be efficiently used with dynamic 8-bit quantization of activations.
The originalnncf.CompressWeightsMode.INT8enum value is now deprecated. - (OpenVINO) Added support for quantizing the ScaledDotProductAttention operation from OpenVINO opset 13.
- (OpenVINO) Added FP8 quantization support via
nncf.QuantizationMode.FP8_E4M3andnncf.QuantizationMode.FP8_E5M2enum values, invoked via passing one of these values as an optionalmodeargument tonncf.quantize. Currently, OpenVINO supports inference of FP8-quantized models in reference mode with no performance benefits and can be used for accuracy projections. - (Common) Post-training Quantization with Accuracy Control -
nncf.quantize_with_accuracy_control()has been extended byrestore_modeoptional parameter to revert weights to int8 instead of the original precision.
This parameter helps to reduce the size of the quantized model and improves its performance.
By default, it's disabled and model weights are reverted to the original precision innncf.quantize_with_accuracy_control(). - (Common) Added an
all_layers: Optional[bool] = Noneargument tonncf.compress_weightsto indicate whether embeddings and last layers of the model should be compressed to a primary precision. This is relevant to 4-bit quantization only. - (Common) Added a
sensitivity_metric: Optional[nncf.parameters.SensitivityMetric] = Noneargument tonncf.compress_weightsfor finer control over the sensitivity metric for assigning quantization precision to layers.
Defaults to weight quantization error if a dataset is not provided for weight compression and to maximum variance of the layers' inputs multiplied by inverted 8-bit quantization noise if a dataset is provided.
By default, the backup precision is assigned for the embeddings and last layers.
Fixes:
- (OpenVINO) Models with embeddings (e.g.
gpt-2,stable-diffusion-v1-5,stable-diffusion-v2-1,opt-6.7b,falcon-7b,bloomz-7b1) are now more accurately quantized. - (PyTorch)
nncf.strip(..., do_copy=True)now actually returns a deepcopy (stripped) of the model object. - (PyTorch) Post-hooks can now be set up on operations that return
torch.return_type(such astorch.max). - (PyTorch) Improved dynamic graph tracing for various tensor operations from
torchnamespace. - (PyTorch) More robust handling of models with disjoint traced graphs when applying PTQ.
Improvements:
- Reformatted the tutorials section in the top-level
README.mdfor better readability.
Deprecations/Removals:
- (Common) The original
nncf.CompressWeightsMode.INT8enum value is now deprecated. - (PyTorch) The Git patch for integration with HuggingFace
transformersrepository is marked as deprecated and will be removed in a future release.
Developers are advised to use optimum-intel instead. - Dockerfiles in the NNCF Git repository are deprecated and will be removed in a future release.
v2.7.0
Post-training Quantization:
Features:
- (OpenVINO) Added support for data-free 4-bit weights compression through NF4 and INT4 data types (
compress_weights(…)pipeline). - (OpenVINO) Added support for IF operation quantization.
- (OpenVINO) Added
dump_intermediate_modelparameter support for AccuracyAwareAlgorithm (quantize_with_accuracy_control(…)pipeline). - (OpenVINO) Added support for SmoothQuant and ChannelAlignment algorithms for HyperparameterTuner algorithm (
quantize_with_tune_hyperparams(…)pipeline). - (PyTorch) Post-training Quantization is now supported with
quantize(…)pipeline and the common implementation of quantization algorithms. Deprecatedcreate_compressed_model()method for Post-training Quantization. - Added new types (AvgPool, GroupNorm, LayerNorm) to the ignored scope for
ModelType.Transformerscheme. QuantizationPreset.Mixedwas set as the default forModelType.Transformerscheme.
Fixes:
- (OpenVINO, ONNX, PyTorch) Aligned/added patterns between backends (SE block, MVN layer, multiple activations, etc.) to restore performance/metrics.
- Fixed patterns for
ModelType.Transformerto align with the quantization scheme.
Improvements:
- Improved UX with the new progress bar for pipeline, new exceptions, and .dot graph visualization updates.
- (OpenVINO) Optimized WeightsCompression algorithm (
compress_weights(…)pipeline) execution time for LLM's quantization, added ignored scope support. - (OpenVINO) Optimized AccuracyAwareQuantization algorithm execution time with multi-threaded approach while calculating ranking score (
quantize_with_accuracy_control(…)pipeline). - (OpenVINO) Added extract_ov_subgraph tool for large IR subgraph extraction.
- (ONNX) Optimized quantization pipeline (up to 1.15x speed up).
Tutorials:
- Post-Training Optimization of BLIP Model
- Post-Training Optimization of DeepFloyd IF Model
- Post-Training Optimization of Grammatical Error Correction Model
- Post-Training Optimization of Dolly 2.0 Model
- Post-Training Optimization of Massively Multilingual Speech Model
- Post-Training Optimization of OneFormer Model
- Post-Training Optimization of InstructPix2Pix Model
- Post-Training Optimization of LLaVA Model
- Post-Training Optimization of Latent Consistency Model
- Post-Training Optimization of Distil-Whisper Model
- Post-Training Optimization of FastSAM Model
Known issues:
- (ONNX)
quantize(...)method can generate inaccurate int8 results for models with the BatchNormalization layer that contains biases. To get the best accuracy, use thedo_constant_folding=Trueoption during export from PyTorch to ONNX.
Compression-aware training:
Fixes:
- (PyTorch) Fixed Hessian trace calculation to solve #2155 issue.
Requirements:
- Updated PyTorch version (2.1.0).
- Updated numpy version (<1.27).
Deprecations/Removals:
- (PyTorch) Removed legacy external quantizer storage names.
- (PyTorch) Removed torch < 2.0 version support.
v2.6.0
Post-training Quantization:
Features:
- Added
CPU_SPRdevice type support. - Added quantizers scales unification.
- Added quantization scheme for ReduceSum operation.
- Added new types (ReduceL2, ReduceSum, Maximum) to the ignored scope for
ModelType.Transformer. - (OpenVINO) Added SmoothQuant algorithm.
- (OpenVINO) Added ChannelAlignment algorithm.
- (OpenVINO) Added HyperparameterTuner algorithm.
- (PyTorch) Added FastBiasCorrection algorithm support.
- (OpenVINO, ONNX) Added embedding weights quantization.
- (OpenVINO, PyTorch) Added new
compress_weightsmethod that provides data-free INT8 weights compression.
Fixes:
- Fixed detection of decomposed post-processing in models.
- Multiple fixes (new patterns, bugfixes, etc.) to solve #1936 issue.
- Fixed model reshaping while quantization to keep original model shape.
- (OpenVINO) Added support for sequential models quanitzation.
- (OpenVINO) Fixed in-place statistics cast to support empty dimensions.
- (OpenVINO, ONNX) Fixed quantization of the MatMul operation with weights rank > 2.
- (OpenVINO, ONNX) Fixed BiasCorrection algorithm to enable CLIP model quantization.
Improvements:
- Optimized
quantize(…)pipeline (up to 4.3x speed up in total). - Optimized
quantize_with_accuracy_control(…)pipelilne (up to 8x speed up for 122-quantizing-model-with-accuracy-control notebook). - Optimized general statistics collection (up to 1.2x speed up for ONNX backend).
- Ignored patterns separated from Fused patterns scheme (with multiple patterns addition).
Tutorials:
- Post-Training Optimization of Segment Anything Model.
- Post-Training Optimization of CLIP Model.
- Post-Training Optimization of ImageBind Model.
- Post-Training Optimization of Whisper Model.
- Post-Training Optimization with accuracy control.
Compression-aware training:
Features:
- Added shape pruning processor for BootstrapNAS algorithm.
- Added KD loss for BootstrapNAS algorithm.
- Added
validate_scopesparameter for NNCF configuration. - (PyTorch) Added PyTorch 2.0 support.
- (PyTorch) Added
.strip()option to API. - (PyTorch) Enabled bfloat data type for quantization kernels.
- (PyTorch) Quantized models can now be
torch.jit.traced without calling.strip(). - (PyTorch) Added support for overridden
forwardinstance attribute on model objects passed intocreate_compressed_model. - (Tensorflow) Added Tensorflow 2.12 support.
Fixes:
- (PyTorch) Fixed padding adjustment issue in the elastic kernel to work with the different active kernel sizes.
- (PyTorch) Fixed the torch graph tracing in the case the tensors belonging to parallel edges are interleaved in the order of the tensor argument.
- (PyTorch) Fixed recurrent nodes matching (LSTM, GRU cells) condition with the strict rule to avoid adding not necessary nodes to the ignored scope.
- (PyTorch) Fixed
torch.jit.scriptwrapper so that user-side handling exceptions duringtorch.jit.scriptinvocation do not cause NNCF to be permanently disabled. - (PyTorch, Tensorflow) Adjusted quantizer propagation algorithm to check if quantizer propagation will result in output quantization.
- (PyTorch) Added redefined
__class__method for ProxyModule that avoids causing error while calling.super()in forward method.
Deprecations/Removals:
- (PyTorch) Removed deprecated
NNCFNetwork.__getattr__,NNCFNetwork.get_nncf_wrapped_modelmethods.
Requirements:
- Updated PyTorch version (2.0.1).
- Updated Tensorflow version (2.12.0).
v2.5.0
Post-training Quantization:
Features:
- Official release of OpenVINO framework support.
- Ported NNCF OpenVINO backend to use the nGraph representation of OpenVINO models.
- Changed dependecies of NNCF OpenVINO backend. It now depends on
openvinopackage and not on theopenvino-devpackage. - Added GRU/LSTM quantization support.
- Added quantizer scales unification.
- Added support for models with 3D and 5D Depthwise convolution.
- Added FP16 OpenVINO models support.
- Added
"overflow_fix"parameter (forquantize(...)&quantize_with_accuracy_control(...)methods) support & functionality. It improves accuracy for optimized model for affected devices. More details in Quantization section. - (OpenVINO) Added support for in-place statistics collection (reduce memory footprint during optimization).
- (OpenVINO) Added Quantization with accuracy control algorithm.
- (OpenVINO) Added YOLOv8 examples for
quantize(...)&quantize_with_accuracy_control(...)methods. - (PyTorch) Added min-max quantization algorithm as experimental.
Fixes:
- Fixed
ignored_scopeattribute behaviour for weights. Now, the weighted layers excludes from optimization scope correctly. - (ONNX) Checking correct ONNX opset version via the
nncf.quantize(...). Now, models with opset < 13 are optimized correctly in per-tensor quantization.
Improvements:
- Added improvements for statistic collection process (collect weights statistics only once).
- (PyTorch, OpenVINO, ONNX) Introduced unified quantizer parameters calculation.
Known issues:
quantize(...)method can generate inaccurate int8 results for models with the DenseNet-like architecture. Usequantize_with_accuracy_control(...)in such case.quantize(...)method can hang on models with transformer architecture whenfast_bias_correctionoptional parameter is set to False. Don't set it to False or usequantize_with_accuracy_control(...)in such case.quantize(...)method can generate inaccurate int8 results for models with the MobileNet-like architecture on non-VNNI machines.
Compression-aware training:
New Features:
- Introduced automated structured pruning algorithm for JPQD with support for BERT, Wave2VecV2, Swin, ViT, DistilBERT, CLIP, and MobileBERT models.
- Added
nncf.common.utils.patcher.Patcher- this class can be used to patch methods on live PyTorch model objects with wrappers such asnncf.torch.dynamic_graph.context.no_nncf_tracewhen doing so in the model code is not possible (e.g. if the model comes from an external library package). - Compression controllers of the
nncf.api.compression.CompressionAlgorithmControllerclass now have a.strip()method that will return the compressed model object with as many custom NNCF additions removed as possible while preserving the functioning of the model object as a compressed model.
Fixes:
- Fixed statistics computation for pruned layers.
- (PyTorch) Fixed traced tensors to implement the YOLOv8 from Ultralytics.
Improvements:
- Extension of attributes (
transpose/permute/getitem) for pruning node selector. - NNCFNetwork was refactored from a wrapper-approach to a mixin-like approach.
- Added average pool 3d-like ops to pruning mask.
- Added Conv3d for overflow fix.
nncf.set_log_file(...)can now be used to set location of the NNCF log file.- (PyTorch) Added support for pruning of
torch.nn.functional.padoperation. - (PyTorch) Added
torch.baddbmmas an alias for the matmul metatype for quantization purposes. - (PyTorch) Added config file for ResNet18 accuracy-aware pruning + quantization on CIFAR10.
- (PyTorch) Fixed JIT-traceable PyTorch models with internal patching.
- (PyTorch) Added
__matmul__magic functions to the list of patched ops (for SwinTransformer by Microsoft).
Requirements:
- Updated ONNX version (1.13)
- Updated Tensorflow version (2.11)
General changes:
- Added Windows support for NNCF.
v2.4.0
Target version updates:
- Bump target framework versions to PyTorch 1.13.1, TensorFlow 2.8.x, ONNX 1.12, ONNXRuntime 1.13.1
- Increased target HuggingFace transformers version for the integration patch to 4.23.1
Features:
- Official release of the ONNX framework support.
NNCF may now be used for post-training quantization (PTQ) on ONNX models.
Added an example script demonstrating the ONNX post-training quantization on MobileNetV2. - Preview release of OpenVINO framework support.
NNCF may now be used for post-training quantization on OpenVINO models. Added an example script demonstrating the OpenVINO post-training quantization on MobileNetV2.
pip install nncf[openvino]will install NNCF with the required OV framework dependencies. - Common post-training quantization API across the supported framework model formats (PyTorch, TensorFlow, ONNX, OpenVINO IR) via the
nncf.quantize(...)function.
The parameter set of the function is the same for all frameworks - actual framework-specific implementations are being dispatched based on the type of the model object argument. - (PyTorch, TensorFlow) Improved the adaptive compression training functionality to reduce effective training time.
- (ONNX) Post-processing nodes are now automatically excluded from quantization.
- (PyTorch - Experimental) Joint Pruning, Quantization and Distillation for Transformers enabled for certain models from HuggingFace
transformersrepo.
See description of the movement pruning involved in the JPQD for details.
Bugfixes:
- Fixed a division by zero if every operation is added to ignored scope
- Improved logging output, cutting down on the number of messages being output to the standard
logging.INFOlog level. - Fixed FLOPS calculation for linear filters - this impacts existing models that were pruned with a FLOPS target.
- "chunk" and "split" ops are correctly handled during pruning.
- Linear layers may now be pruned by input and output independently.
- Matmul-like operations and subsequent arithmetic operations are now treated as a fused pattern.
- (PyTorch) Fixed a rare condition with accumulator overflow in CUDA quantization kernels, which led to CUDA runtime errors and NaN values appearing in quantized tensors and
- (PyTorch)
transformersintegration patch now allows to export to ONNX during training, and not only at the end of it. - (PyTorch)
torch.nn.utils.weight_normweights are now detected correctly. - (PyTorch) Exporting a model with sparsity or pruning no longer leads to weights in the original model object in-memory to be hard-set to 0.
- (PyTorch - Experimental) improved automatic search of blocks to skip within the NAS algorithm – overlapping blocks are correctly filtered.
- (PyTorch, TensorFlow) Various bugs and issues with compression training were fixed.
- (TensorFlow) Fixed an error with
"num_bn_adaptation_samples": 0in config leading to aTypeErrorduring quantization algo initialization. - (ONNX) Temporary model file is no longer saved on disk.
- (ONNX) Depthwise convolutions are now quantizable in per-channel mode.
- (ONNX) Improved the working time of PTQ by optimizing the calls to ONNX shape inferencing.
Breaking changes:
- Fused patterns will be excluded from quantization via
ignored_scopesonly if the top-most node in data flow order matches againstignored_scopes - NNCF config's
"ignored_scopes"and"target_scopes"are now strictly checked to be matching against at least one node in the model graph instead of silently ignoring the unmatched entries. - Calling
setup.pydirectly to install NNCF is deprecated and no longer guaranteed to work. - Importing NNCF logger as
from nncf.common.utils.logger import logger as nncf_loggeris deprecated - usefrom nncf import nncf_loggerinstead. pruning_rateis renamed topruning_levelin pruning compression controllers.- (ONNX) Removed CompressionBuilder. Excluded examples of NNCF for ONNX with CompressionBuilder API
v2.3.0
New features
- (ONNX) PTQ API support for ONNX.
- (ONNX) Added PTQ examples for ONNX in image classification, object detection, and semantic segmentation.
- (PyTorch) Added
BootstrapNASto find high-performing sub-networks from the super-network optimization.
Bugfixes
- (PyTorch) Returned the initial quantized model when the retraining failed to find out the best checkpoint.
- (Experimental) Fixed weight initialization for
ONNXGraphandMinMaxQuantization.
v2.2.0
New features
- Pre-production quality
- (TensorFlow) Added TensorFlow 2.5.x support.
- (TensorFlow) The
SubclassedConverterclass was added to createNNCFGraphfor thetf.GraphKeras model. - (TensorFlow) Added
TFOpLambdalayer support withTFModelConverter,TFModelTransformer, andTFOpLambdaMetatype. - (TensorFlow) Patterns from
MatMulandConv2DtoBiasAddandMetatypesof TensorFlow operations with weightsTFOpWithWeightsMetatypeare added. - (PyTorch, TensorFlow) Added prunings for
ReshapeandLinearasReshapePruningOpandLinearPruningOp. - (PyTorch) Added mixed precision quantization config with HAWQ for
Resnet50andMobilenet_v2for the latest VPU. - (PyTorch) Splitted
NNCFBatchNormintoNNCFBatchNorm1d,NNCFBatchNorm2d,NNCFBatchNorm3d. - (PyTorch - Experimental) Added the
BNASTrainingControllerandBNASTrainingAlgorithmfor BootstrapNAS to search the model's architecture. - (Experimental) ONNX
ModelProtois now converted toNNCFGraphthroughGraphConverter. - (Experimental)
ONNXOpMetatypeand extended patterns for fusing HW config is now available. - (Experimental) Added
ONNXPostTrainingQuantizationandMinMaxQuantizationsupports for ONNX.
Bugfixes
- (PyTorch, TensorFlow) Added exception handling of BN adaptation for zero sample values.
- (PyTorch, TensorFlow) Fixed learning rate after validation step for
EarlyExitCompressionTrainingLoop. - (PyTorch) Fixed
FakeQuantizerto make exact zeros. - (PyTorch) Fixed Quantizer misplacements during ONNX export.
- (PyTorch) Restored device information during ONNX export.
- (PyTorch) Fixed the statistics collection from the pruned model.