You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sentence_transformer/usage/efficiency.rst
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -132,9 +132,9 @@ Optimizing ONNX Models
132
132
133
133
.. include:: backend_export_sidebar.rst
134
134
135
-
ONNX models can be optimized using Optimum, allowing for speedups on CPUs and GPUs alike. To do this, you can use the :func:`~sentence_transformers.backend.export_optimized_onnx_model` function, which saves the optimized in a directory or model repository that you specify. It expects:
135
+
ONNX models can be optimized using `Optimum<https://huggingface.co/docs/optimum/index>`_, allowing for speedups on CPUs and GPUs alike. To do this, you can use the :func:`~sentence_transformers.backend.export_optimized_onnx_model` function, which saves the optimized in a directory or model repository that you specify. It expects:
136
136
137
-
- ``model``: a Sentence Transformer model loaded with the ONNX backend.
137
+
- ``model``: a Sentence Transformer or Cross Encoder model loaded with the ONNX backend.
138
138
- ``optimization_config``: ``"O1"``, ``"O2"``, ``"O3"``, or ``"O4"`` representing optimization levels from :class:`~optimum.onnxruntime.AutoOptimizationConfig`, or an :class:`~optimum.onnxruntime.OptimizationConfig` instance.
139
139
- ``model_name_or_path``: a path to save the optimized model file, or the repository name if you want to push it to the Hugging Face Hub.
140
140
- ``push_to_hub``: (Optional) a boolean to push the optimized model to the Hugging Face Hub.
@@ -204,9 +204,9 @@ Quantizing ONNX Models
204
204
205
205
.. include:: backend_export_sidebar.rst
206
206
207
-
ONNX models can be quantized to int8 precision using Optimum, allowing for faster inference on CPUs. To do this, you can use the :func:`~sentence_transformers.backend.export_dynamic_quantized_onnx_model` function, which saves the quantized in a directory or model repository that you specify. Dynamic quantization, unlike static quantization, does not require a calibration dataset. It expects:
207
+
ONNX models can be quantized to int8 precision using `Optimum<https://huggingface.co/docs/optimum/index>`_, allowing for faster inference on CPUs. To do this, you can use the :func:`~sentence_transformers.backend.export_dynamic_quantized_onnx_model` function, which saves the quantized in a directory or model repository that you specify. Dynamic quantization, unlike static quantization, does not require a calibration dataset. It expects:
208
208
209
-
- ``model``: a Sentence Transformer model loaded with the ONNX backend.
209
+
- ``model``: a Sentence Transformer or Cross Encoder model loaded with the ONNX backend.
210
210
- ``quantization_config``: ``"arm64"``, ``"avx2"``, ``"avx512"``, or ``"avx512_vnni"`` representing quantization configurations from :class:`~optimum.onnxruntime.AutoQuantizationConfig`, or an :class:`~optimum.onnxruntime.QuantizationConfig` instance.
211
211
- ``model_name_or_path``: a path to save the quantized model file, or the repository name if you want to push it to the Hugging Face Hub.
212
212
- ``push_to_hub``: (Optional) a boolean to push the quantized model to the Hugging Face Hub.
@@ -329,15 +329,15 @@ Quantizing OpenVINO Models
329
329
330
330
.. include:: backend_export_sidebar.rst
331
331
332
-
OpenVINO models can be quantized to int8 precision using Optimum Intel to speed up inference.
332
+
OpenVINO models can be quantized to int8 precision using `Optimum Intel<https://huggingface.co/docs/optimum/main/en/intel/index>`_ to speed up inference.
333
333
To do this, you can use the :func:`~sentence_transformers.backend.export_static_quantized_openvino_model` function,
334
334
which saves the quantized model in a directory or model repository that you specify.
335
335
Post-Training Static Quantization expects:
336
336
337
-
- ``model``: a Sentence Transformer model loaded with the OpenVINO backend.
337
+
- ``model``: a Sentence Transformer or Cross Encoder model loaded with the OpenVINO backend.
338
338
- ``quantization_config``: (Optional) The quantization configuration. This parameter accepts either:
339
-
``None`` for the default 8-bit quantization, a dictionary representing quantization configurations, or
340
-
an :class:`~optimum.intel.OVQuantizationConfig` instance.
339
+
``None`` for the default 8-bit quantization, a dictionary representing quantization configurations, or
340
+
an :class:`~optimum.intel.OVQuantizationConfig` instance.
341
341
- ``model_name_or_path``: a path to save the quantized model file, or the repository name if you want to push it to the Hugging Face Hub.
342
342
- ``dataset_name``: (Optional) The name of the dataset to load for calibration. If not specified, defaults to ``sst2`` subset from the ``glue`` dataset.
343
343
- ``dataset_config_name``: (Optional) The specific configuration of the dataset to load.
@@ -541,8 +541,8 @@ Based on the benchmarks, this flowchart should help you decide which backend to
541
541
}
542
542
}}%%
543
543
graph TD
544
-
A(What is your hardware?) -->|GPU| B(Is your text usually smallerthan 500 characters?)
545
-
A -->|CPU| C(Is a 0.4% accuracy lossacceptable?)
544
+
A(What is your hardware?) -->|GPU| B(Is your text usually smaller<br>than 500 characters?)
545
+
A -->|CPU| C(Is a 0.4% accuracy loss<br>acceptable?)
0 commit comments