Skip to content

Commit 34f00bd

Browse files
tsavinaMaximProshinalexsu52
authored
DOCS Update optimization docs with NNCF PTQ changes and deprecation of POT (openvinotoolkit#17398) (openvinotoolkit#17633)
* Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update home.rst * Update ptq_introduction.md * Update Introduction.md * Update Introduction.md * Update Introduction.md * Update ptq_introduction.md * Update ptq_introduction.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update model_optimization_guide.md * Update ptq_introduction.md * Update quantization_w_accuracy_control.md * Update model_optimization_guide.md * Update quantization_w_accuracy_control.md * Update model_optimization_guide.md * Update quantization_w_accuracy_control.md * Update model_optimization_guide.md * Update Introduction.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update ptq_introduction.md * Update Introduction.md * Update model_optimization_guide.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update Introduction.md * Update FrequentlyAskedQuestions.md * Update model_optimization_guide.md * Update Introduction.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update ptq_introduction.md * Update ptq_introduction.md * added code snippet (#1) * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update ptq_introduction.md * Update model_optimization_guide.md * Update basic_quantization_flow.md * Update ptq_introduction.md * Update quantization_w_accuracy_control.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update ptq_introduction.md * Update ptq_introduction.md * Delete ptq_introduction.md * Update FrequentlyAskedQuestions.md * Update Introduction.md * Update quantization_w_accuracy_control.md * Update introduction.md * Update basic_quantization_flow.md code blocks * Update quantization_w_accuracy_control.md code snippets * Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py * Update model_optimization_guide.md * Optimization docs proofreading (#2) * images updated * delete reminder * review * text review * change images to original ones * Update filter_pruning.md code blocks * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update images (#3) * images updated * delete reminder * review * text review * change images to original ones * Update filter_pruning.md code blocks * update images * resolve conflicts * resolve conflicts * change images to original ones * resolve conflicts * update images * fix conflicts * Update model_optimization_guide.md * Update docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py * Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py * Update docs/optimization_guide/nncf/ptq/code/ptq_onnx.py * Update docs/optimization_guide/nncf/ptq/code/ptq_aa_openvino.py * Update docs/optimization_guide/nncf/ptq/code/ptq_openvino.py * table format fix * Update headers * Update qat.md code blocks --------- Co-authored-by: Maksim Proshin <[email protected]> Co-authored-by: Alexander Suslov <[email protected]>
1 parent 17326ab commit 34f00bd

36 files changed

+664
-467
lines changed
Lines changed: 2 additions & 2 deletions
Loading
Lines changed: 2 additions & 2 deletions
Loading
Lines changed: 2 additions & 2 deletions
Loading

docs/home.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ You can integrate and offload to accelerators additional operations for pre- and
6969
Model Quantization and Compression
7070
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7171

72-
Boost your model’s speed even further with quantization and other state-of-the-art compression techniques available in OpenVINO’s Post-Training Optimization Tool and Neural Network Compression Framework. These techniques also reduce your model size and memory requirements, allowing it to be deployed on resource-constrained edge hardware.
72+
Boost your model’s speed even further with quantization and other state-of-the-art compression techniques available in OpenVINO’s Neural Network Compression Framework. These techniques also reduce your model size and memory requirements, allowing it to be deployed on resource-constrained edge hardware.
7373

7474
.. panels::
7575
:card: homepage-panels

docs/optimization_guide/model_optimization_guide.md

Lines changed: 9 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -8,40 +8,30 @@
88

99
ptq_introduction
1010
tmo_introduction
11-
(Experimental) Protecting Model <pot_ranger_README>
1211

1312

14-
Model optimization is an optional offline step of improving final model performance by applying special optimization methods, such as quantization, pruning, preprocessing optimization, etc. OpenVINO provides several tools to optimize models at different steps of model development:
13+
Model optimization is an optional offline step of improving the final model performance and reducing the model size by applying special optimization methods, such as 8-bit quantization, pruning, etc. OpenVINO offers two optimization paths implemented in `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf>`__:
1514

16-
- :doc:`Model Optimizer <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` implements most of the optimization parameters to a model by default. Yet, you are free to configure mean/scale values, batch size, RGB vs BGR input channels, and other parameters to speed up preprocess of a model (:doc:`Embedding Preprocessing Computation <openvino_docs_MO_DG_Additional_Optimization_Use_Cases>`).
15+
- :doc:`Post-training Quantization <ptq_introduction>` is designed to optimize the inference of deep learning models by applying the post-training 8-bit integer quantization that does not require model retraining or fine-tuning.
1716

18-
- :doc:`Post-training Quantization <pot_introduction>` is designed to optimize inference of deep learning models by applying post-training methods that do not require model retraining or fine-tuning, for example, post-training 8-bit integer quantization.
17+
- :doc:`Training-time Optimization <tmo_introduction>`, a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow 2.x. It supports methods like Quantization-aware Training, Structured and Unstructured Pruning, etc.
1918

20-
- :doc:`Training-time Optimization <nncf_ptq_introduction>`, a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow 2.x. It supports methods, like Quantization-aware Training and Filter Pruning. NNCF-optimized models can be inferred with OpenVINO using all the available workflows.
19+
.. note:: OpenVINO also supports optimized models (for example, quantized) from source frameworks such as PyTorch, TensorFlow, and ONNX (in Q/DQ format). No special steps are required in this case and optimized models can be converted to the OpenVINO Intermediate Representation format (IR) right away.
2120

21+
Post-training Quantization is the fastest way to optimize a model and should be applied first, but it is limited in terms of achievable accuracy-performance trade-off. In case of poor accuracy or performance after Post-training Quantization, Training-time Optimization can be used as an option.
2222

23-
Detailed workflow:
24-
##################
25-
26-
To understand which development optimization tool you need, refer to the diagram:
23+
Once the model is optimized using the aforementioned methods, it can be used for inference using the regular OpenVINO inference workflow. No changes to the inference code are required.
2724

2825
.. image:: _static/images/DEVELOPMENT_FLOW_V3_crunch.svg
2926

30-
Post-training methods are limited in terms of achievable accuracy-performance trade-off for optimizing models. In this case, training-time optimization with NNCF is an option.
31-
32-
Once the model is optimized using the aforementioned tools it can be used for inference using the regular OpenVINO inference workflow. No changes to the inference code are required.
33-
3427
.. image:: _static/images/WHAT_TO_USE.svg
3528

36-
Post-training methods are limited in terms of achievable accuracy, which may degrade for certain scenarios. In such cases, training-time optimization with NNCF may give better results.
37-
38-
Once the model has been optimized using the aforementioned tools, it can be used for inference using the regular OpenVINO inference workflow. No changes to the code are required.
39-
40-
If you are not familiar with model optimization methods, refer to :doc:`post-training methods <pot_introduction>`.
41-
4229
Additional Resources
4330
####################
4431

32+
- :doc:`Post-training Quantization <ptq_introduction>`
33+
- :doc:`Training-time Optimization <tmo_introduction>`
4534
- :doc:`Deployment optimization <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>`
35+
- `HuggingFace Optimum Intel <https://huggingface.co/docs/optimum/intel/optimization_ov>`__
4636

4737
@endsphinxdirective

docs/optimization_guide/nncf/filter_pruning.md

Lines changed: 122 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,15 @@
55
Introduction
66
####################
77

8-
Filter pruning is an advanced optimization method which allows reducing computational complexity of the model by removing
9-
redundant or unimportant filters from convolutional operations of the model. This removal is done in two steps:
8+
Filter pruning is an advanced optimization method that allows reducing the computational complexity of the model by removing
9+
redundant or unimportant filters from the convolutional operations of the model. This removal is done in two steps:
1010

1111
1. Unimportant filters are zeroed out by the NNCF optimization with fine-tuning.
1212

1313
2. Zero filters are removed from the model during the export to OpenVINO Intermediate Representation (IR).
1414

1515

16-
Filter Pruning method from the NNCF can be used stand-alone but we usually recommend to stack it with 8-bit quantization for
16+
Filter Pruning method from the NNCF can be used stand-alone but we usually recommend stacking it with 8-bit quantization for
1717
two reasons. First, 8-bit quantization is the best method in terms of achieving the highest accuracy-performance trade-offs so
1818
stacking it with filter pruning can give even better performance results. Second, applying quantization along with filter
1919
pruning does not hurt accuracy a lot since filter pruning removes noisy filters from the model which narrows down values
@@ -37,44 +37,52 @@ Here, we show the basic steps to modify the training script for the model and us
3737

3838
In this step, NNCF-related imports are added in the beginning of the training script:
3939

40-
.. tab:: PyTorch
40+
.. tab-set::
4141

42-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
43-
:language: python
44-
:fragment: [imports]
42+
.. tab-item:: PyTorch
43+
:sync: pytorch
44+
45+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
46+
:language: python
47+
:fragment: [imports]
48+
49+
.. tab-item:: TensorFlow 2
50+
:sync: tensorflow
4551

46-
.. tab:: TensorFlow 2
47-
48-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
49-
:language: python
50-
:fragment: [imports]
52+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
53+
:language: python
54+
:fragment: [imports]
5155

5256
2. Create NNCF configuration
5357
++++++++++++++++++++++++++++
5458

5559
Here, you should define NNCF configuration which consists of model-related parameters (`"input_info"` section) and parameters
5660
of optimization methods (`"compression"` section).
5761

58-
.. tab:: PyTorch
59-
60-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
61-
:language: python
62-
:fragment: [nncf_congig]
63-
64-
.. tab:: TensorFlow 2
65-
66-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
67-
:language: python
68-
:fragment: [nncf_congig]
69-
70-
Here is a brief description of the required parameters of the Filter Pruning method. For full description refer to the
62+
.. tab-set::
63+
64+
.. tab-item:: PyTorch
65+
:sync: pytorch
66+
67+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
68+
:language: python
69+
:fragment: [nncf_congig]
70+
71+
.. tab-item:: TensorFlow 2
72+
:sync: tensorflow
73+
74+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
75+
:language: python
76+
:fragment: [nncf_congig]
77+
78+
Here is a brief description of the required parameters of the Filter Pruning method. For a full description refer to the
7179
`GitHub <https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Pruning.md>`__ page.
7280

7381
* ``pruning_init`` - initial pruning rate target. For example, value ``0.1`` means that at the begging of training, convolutions that can be pruned will have 10% of their filters set to zero.
7482

7583
* ``pruning_target`` - pruning rate target at the end of the schedule. For example, the value ``0.5`` means that at the epoch with the number of ``num_init_steps + pruning_steps``, convolutions that can be pruned will have 50% of their filters set to zero.
7684

77-
* ``pruning_steps` - the number of epochs during which the pruning rate target is increased from ``pruning_init` to ``pruning_target`` value. We recommend to keep the highest learning rate during this period.
85+
* ``pruning_steps` - the number of epochs during which the pruning rate target is increased from ``pruning_init` to ``pruning_target`` value. We recommend keeping the highest learning rate during this period.
7886

7987

8088
3. Apply optimization methods
@@ -86,39 +94,44 @@ that can be used the same way as the original model. It is worth noting that opt
8694
so that the model undergoes a set of corresponding transformations and can contain additional operations required for the
8795
optimization.
8896

97+
.. tab-set::
8998

90-
.. tab:: PyTorch
91-
92-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
93-
:language: python
94-
:fragment: [wrap_model]
95-
96-
.. tab:: TensorFlow 2
97-
98-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
99-
:language: python
100-
:fragment: [wrap_model]
99+
.. tab-item:: PyTorch
100+
:sync: pytorch
101+
102+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
103+
:language: python
104+
:fragment: [wrap_model]
105+
106+
.. tab-item:: TensorFlow 2
107+
:sync: tensorflow
101108

109+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
110+
:language: python
111+
:fragment: [wrap_model]
102112

103113
4. Fine-tune the model
104114
++++++++++++++++++++++
105115

106116
This step assumes that you will apply fine-tuning to the model the same way as it is done for the baseline model. In the case
107117
of Filter Pruning method we recommend using the training schedule and learning rate similar to what was used for the training
108-
of original model.
109-
118+
of the original model.
110119

111-
.. tab:: PyTorch
120+
.. tab-set::
112121

113-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
114-
:language: python
115-
:fragment: [tune_model]
122+
.. tab-item:: PyTorch
123+
:sync: pytorch
124+
125+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
126+
:language: python
127+
:fragment: [tune_model]
128+
129+
.. tab-item:: TensorFlow 2
130+
:sync: tensorflow
116131

117-
.. tab:: TensorFlow 2
118-
119-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
120-
:language: python
121-
:fragment: [tune_model]
132+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
133+
:language: python
134+
:fragment: [tune_model]
122135

123136

124137
5. Multi-GPU distributed training
@@ -127,38 +140,43 @@ of original model.
127140
In the case of distributed multi-GPU training (not DataParallel), you should call ``compression_ctrl.distributed()`` before the
128141
fine-tuning that will inform optimization methods to do some adjustments to function in the distributed mode.
129142

130-
131-
.. tab:: PyTorch
132-
133-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
134-
:language: python
135-
:fragment: [distributed]
136-
137-
.. tab:: TensorFlow 2
138-
139-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
140-
:language: python
141-
:fragment: [distributed]
142-
143-
143+
.. tab-set::
144+
145+
.. tab-item:: PyTorch
146+
:sync: pytorch
147+
148+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
149+
:language: python
150+
:fragment: [distributed]
151+
152+
.. tab-item:: TensorFlow 2
153+
:sync: tensorflow
154+
155+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
156+
:language: python
157+
:fragment: [distributed]
158+
144159
6. Export quantized model
145160
+++++++++++++++++++++++++
146161

147162
When fine-tuning finishes, the quantized model can be exported to the corresponding format for further inference: ONNX in
148163
the case of PyTorch and frozen graph - for TensorFlow 2.
149164

165+
.. tab-set::
150166

151-
.. tab:: PyTorch
152-
153-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
154-
:language: python
155-
:fragment: [export]
156-
157-
.. tab:: TensorFlow 2
167+
.. tab-item:: PyTorch
168+
:sync: pytorch
169+
170+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
171+
:language: python
172+
:fragment: [export]
173+
174+
.. tab-item:: TensorFlow 2
175+
:sync: tensorflow
158176

159-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
160-
:language: python
161-
:fragment: [export]
177+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
178+
:language: python
179+
:fragment: [export]
162180

163181

164182
These were the basic steps to applying the QAT method from the NNCF. However, it is required in some cases to save/load model
@@ -170,57 +188,63 @@ checkpoints during the training. Since NNCF wraps the original model with its ow
170188

171189
To save model checkpoint use the following API:
172190

191+
.. tab-set::
173192

174-
.. tab:: PyTorch
175-
176-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
177-
:language: python
178-
:fragment: [save_checkpoint]
179-
180-
.. tab:: TensorFlow 2
181-
182-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
183-
:language: python
184-
:fragment: [save_checkpoint]
193+
.. tab-item:: PyTorch
194+
:sync: pytorch
195+
196+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
197+
:language: python
198+
:fragment: [save_checkpoint]
199+
200+
.. tab-item:: TensorFlow 2
201+
:sync: tensorflow
185202

203+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
204+
:language: python
205+
:fragment: [save_checkpoint]
206+
186207

187208
8. (Optional) Restore from checkpoint
188209
+++++++++++++++++++++++++++++++++++++
189210

190211
To restore the model from checkpoint you should use the following API:
191212

192-
.. tab:: PyTorch
193-
194-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
195-
:language: python
196-
:fragment: [load_checkpoint]
197-
198-
.. tab:: TensorFlow 2
213+
.. tab-set::
199214

200-
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
201-
:language: python
202-
:fragment: [load_checkpoint]
215+
.. tab-item:: PyTorch
216+
:sync: pytorch
217+
218+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_torch.py
219+
:language: python
220+
:fragment: [load_checkpoint]
221+
222+
.. tab-item:: TensorFlow 2
223+
:sync: tensorflow
203224

225+
.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
226+
:language: python
227+
:fragment: [load_checkpoint]
204228

205229
For more details on saving/loading checkpoints in the NNCF, see the following
206230
`documentation <https://github.com/openvinotoolkit/nncf/blob/develop/docs/Usage.md#saving-and-loading-compressed-models>`__.
207231

208232
Deploying pruned model
209233
######################
210234

211-
The pruned model requres an extra step that should be done to get performance improvement. This step involves removal of the
212-
zero filters from the model. This is done at the model conversion step using :doc:`Model Optimizer <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` tool when model is converted from the framework representation (ONNX, TensorFlow, etc.) to OpenVINO Intermediate Representation.
235+
The pruned model requires an extra step that should be done to get a performance improvement. This step involves the removal of the
236+
zero filters from the model. This is done at the model conversion step using :doc:`Model Optimizer <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` tool when the model is converted from the framework representation (ONNX, TensorFlow, etc.) to OpenVINO Intermediate Representation.
213237

214-
* To remove zero filters from the pruned model add the following parameter to the model convertion command: ``--transform=Pruning``
238+
* To remove zero filters from the pruned model add the following parameter to the model conversion command: ``--transform=Pruning``
215239

216-
After that the model can be deployed with OpenVINO in the same way as the baseline model.
240+
After that, the model can be deployed with OpenVINO in the same way as the baseline model.
217241
For more details about model deployment with OpenVINO, see the corresponding :doc:`documentation <openvino_docs_OV_UG_OV_Runtime_User_Guide>`.
218242

219243

220244
Examples
221245
####################
222246

223-
* `PyTorch Image Classiication example <https://github.com/openvinotoolkit/nncf/blob/develop/examples/torch/classification>`__
247+
* `PyTorch Image Classification example <https://github.com/openvinotoolkit/nncf/blob/develop/examples/torch/classification>`__
224248

225249
* `TensorFlow Image Classification example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/tensorflow/classification>`__
226250

0 commit comments

Comments
 (0)