Skip to content

Commit 6a96697

Browse files
committed
Update notes
1 parent bbd567d commit 6a96697

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

docs/source/inference.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ Torchao's sparsity support can be combined with quantization for additional perf
103103
print(response)
104104
105105
.. note::
106-
For more information on supported quantization and sparsity configurations, see `HF-Torchao Docs <https://huggingface.co/docs/transformers/main/en/quantization/torchao>`_.
106+
For more information on supported quantization and sparsity configurations, see `HF-Torchao Docs <https://huggingface.co/docs/transformers/main/en/quantization/torchao>`_.
107107

108108
Inference with Transformers
109109
---------------------------
@@ -356,6 +356,10 @@ When using vLLM with torchao:
356356
- **Sparsity Support**: Semi-structured (2:4) sparsity for faster inference (see `Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity <https://pytorch.org/blog/accelerating-neural-network-training/>`_ blog post)
357357
- **KV Cache Quantization**: Enables long context inference with lower memory (see `KV Cache Quantization <https://github.com/pytorch/ao/blob/main/torchao/_models/llama/README.md>`_)
358358

359+
.. note::
360+
For more information on vLLM Integration, please refer to the detailed guide :ref:`torchao_vllm_integration`.
361+
362+
359363
Mobile Deployment with ExecuTorch
360364
---------------------------------
361365

@@ -518,7 +522,7 @@ The torchao-optimized 8da4w model provides:
518522
- **Accuracy**: Maintained within 5-10% of original model on most benchmarks
519523

520524
.. note::
521-
For detailed instructions on testing the executorch model and reproducing benchmarks please refer to the `HF Phi-4-mini-instruct-8da4w model <https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w>`_.
525+
For detailed instructions on testing the executorch model and reproducing benchmarks please refer to the `HF Phi-4-mini-instruct-8da4w model <https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w>`_.
522526

523527
**Conclusion**
524528
==============

0 commit comments

Comments
 (0)