diff --git a/CHANGELOG.rst b/CHANGELOG.rst index cc172bdcf..d18f7c370 100755 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -255,7 +255,7 @@ NVIDIA Model Optimizer Changelog - NeMo and Megatron-LM distributed checkpoint (``torch-dist``) stored with legacy version can no longer be loaded. The remedy is to load the legacy distributed checkpoint with 0.29 and store a ``torch`` checkpoint and resume with 0.31 to convert to a new format. The following changes only apply to storing and resuming distributed checkpoint. - ``quantizer_state`` of :class:`TensorQuantizer ` is now stored in ``extra_state`` of :class:`QuantModule ` where it used to be stored in the sharded ``modelopt_state``. - The dtype and shape of ``amax`` and ``pre_quant_scale`` stored in the distributed checkpoint are now restored. Some dtype and shape are previously changed to make all decoder layers to have homogeneous structure in the checkpoint. - - Together with megatron.core-0.13, quantized model will store and resume distributed checkpoint in a heterogenous format. + - Together with megatron.core-0.13, quantized model will store and resume distributed checkpoint in a heterogeneous format. - auto_quantize API now accepts a list of quantization config dicts as the list of quantization choices. - This API previously accepts a list of strings of quantization format names. It was therefore limited to only pre-defined quantization formats unless through some hacks. - With this change, now user can easily use their own custom quantization formats for auto_quantize. diff --git a/docs/source/guides/7_nas.rst b/docs/source/guides/7_nas.rst index a22f76dd7..2f2de63bf 100644 --- a/docs/source/guides/7_nas.rst +++ b/docs/source/guides/7_nas.rst @@ -570,7 +570,7 @@ NAS-based training During training of an search space, we simultaneously train both the model's weights and architecture: -* Using :mod:`modelopt.torch.nas` you can re-use your existing +* Using :mod:`modelopt.torch.nas` you can reuse your existing training loop to train the search space. * During search space training the entire collection of subnets is automatically trained together diff --git a/examples/diffusers/README.md b/examples/diffusers/README.md index 6af226752..d4ef70169 100644 --- a/examples/diffusers/README.md +++ b/examples/diffusers/README.md @@ -12,7 +12,7 @@ Cache Diffusion is a technique that reuses cached outputs from previous diffusio | :------------: | :------------: | :------------: | :------------: | | Pre-Requisites | Required & optional packages to use this technique | \[[Link](#pre-requisites)\] | | | Getting Started | Learn how to optimize your models using quantization/cache diffusion to reduce precision and improve inference efficiency | \[[Link](#getting-started)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] | -| Support Matrix | View the support matrix to see quantization/cahce diffusion compatibility and feature availability across different models | \[[Link](#support-matrix)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] | +| Support Matrix | View the support matrix to see quantization/cache diffusion compatibility and feature availability across different models | \[[Link](#support-matrix)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] | | Cache Diffusion | Caching technique to accelerate inference without compromising quality | \[[Link](#cache-diffusion)\] | | | Post Training Quantization (PTQ) | Example scripts on how to run PTQ on diffusion models | \[[Link](#post-training-quantization-ptq)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] | | Quantization Aware Training (QAT) | Example scripts on how to run QAT on diffusion models | \[[Link](#quantization-aware-training-qat)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] | @@ -389,7 +389,7 @@ This simple code demonstrates how to evaluate images generated by diffusion (or ### Install Requirements ```bash -pip install -r eval/requirments.txt +pip install -r eval/requirements.txt ``` ### Data Format diff --git a/examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb b/examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb index a9bb6589b..a59e2a298 100644 --- a/examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb +++ b/examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb @@ -23,7 +23,7 @@ "id": "c3f7f931-ac38-494e-aea8-ca2cd6d05794", "metadata": {}, "source": [ - "## Installing Prerequisites and Dependancies" + "## Installing Prerequisites and Dependencies" ] }, { @@ -31,14 +31,14 @@ "id": "d7d4f25f-e569-42cf-8022-bb7cc6f9ea6e", "metadata": {}, "source": [ - "If you haven't already, install the required dependencies for this notebook. Key dependancies include:\n", + "If you haven't already, install the required dependencies for this notebook. Key dependencies include:\n", "\n", "- nvidia-modelopt\n", "- torch\n", "- transformers\n", "- jupyterlab\n", "\n", - "This repo contains a `examples/llm_qat/notebooks/requirements.txt` file that can be used to install all required dependancies." + "This repo contains a `examples/llm_qat/notebooks/requirements.txt` file that can be used to install all required dependencies." ] }, { @@ -414,7 +414,7 @@ "id": "e471ef6c-1346-4e5e-8782-5e9f2bc38f8a", "metadata": {}, "source": [ - "Once you have quantized the model you can now start the post-training process for QAT. The training process will calculate validation loss at 50 step intervals and save the model. These can be controled by adjusting the `eval_steps` and `output_dir` above along with other `training_args`." + "Once you have quantized the model you can now start the post-training process for QAT. The training process will calculate validation loss at 50 step intervals and save the model. These can be controlled by adjusting the `eval_steps` and `output_dir` above along with other `training_args`." ] }, { @@ -799,7 +799,7 @@ "id": "eca645fb-d8d2-4c98-9cb2-afbae7d30d6c", "metadata": {}, "source": [ - "## Stop the TensorRT-LLM Docker contrainer" + "## Stop the TensorRT-LLM Docker container" ] }, { diff --git a/examples/specdec_bench/README.md b/examples/specdec_bench/README.md index 1987167e7..5c4271225 100644 --- a/examples/specdec_bench/README.md +++ b/examples/specdec_bench/README.md @@ -2,7 +2,7 @@ ## Installation -This benchmark is meant to be a lightweight layer ontop of an existing vLLM/SGLang/TRTLLM installation. For example, no install +This benchmark is meant to be a lightweight layer on top of an existing vLLM/SGLang/TRTLLM installation. For example, no install is required if one is running in the following dockers: `vllm/vllm-openai:v0.11.0` (vLLM), `lmsysorg/sglang:v0.5.4.post2` (SGLang), or `nvcr.io/nvidia/tensorrt-llm/release:1.2.0` (TRT-LLM). diff --git a/examples/speculative_decoding/README.md b/examples/speculative_decoding/README.md index 2a29f644e..4362eeec0 100644 --- a/examples/speculative_decoding/README.md +++ b/examples/speculative_decoding/README.md @@ -89,7 +89,7 @@ For large models, you can export intermediate hidden states to disk and train on ### Dumpping Hidden States to Disk -We support two backends for generating base model hidden states. For better effciency, it is recommended to use TRT-LLM: +We support two backends for generating base model hidden states. For better efficiency, it is recommended to use TRT-LLM: ```bash python collect_hidden_states/compute_hidden_states_trtllm.py \