NVIDIA · noeyy-mino · Apr 1, 2026 · coderabbitai · Apr 1, 2026
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -255,7 +255,7 @@ NVIDIA Model Optimizer Changelog
 - NeMo and Megatron-LM distributed checkpoint (``torch-dist``) stored with legacy version can no longer be loaded. The remedy is to load the legacy distributed checkpoint with 0.29 and store a ``torch`` checkpoint and resume with 0.31 to convert to a new format. The following changes only apply to storing and resuming distributed checkpoint.
     - ``quantizer_state`` of :class:`TensorQuantizer <modelopt.torch.quantization.nn.modules.TensorQuantizer>` is now stored in ``extra_state`` of :class:`QuantModule <modelopt.torch.quantization.nn.module.QuantModule>` where it used to be stored in the sharded ``modelopt_state``.
     - The dtype and shape of ``amax`` and ``pre_quant_scale`` stored in the distributed checkpoint are now restored. Some dtype and shape are previously changed to make all decoder layers to have homogeneous structure in the checkpoint.
-    - Together with megatron.core-0.13, quantized model will store and resume distributed checkpoint in a heterogenous format.
+    - Together with megatron.core-0.13, quantized model will store and resume distributed checkpoint in a heterogeneous format.
 - auto_quantize API now accepts a list of quantization config dicts as the list of quantization choices.
     - This API previously accepts a list of strings of quantization format names. It was therefore limited to only pre-defined quantization formats unless through some hacks.
     - With this change, now user can easily use their own custom quantization formats for auto_quantize.

diff --git a/docs/source/guides/7_nas.rst b/docs/source/guides/7_nas.rst
@@ -570,7 +570,7 @@ NAS-based training
 During training of an search space, we simultaneously train both the model's weights and
 architecture:
 
-* Using :mod:`modelopt.torch.nas<modelopt.torch.nas>` you can re-use your existing
+* Using :mod:`modelopt.torch.nas<modelopt.torch.nas>` you can reuse your existing
   training loop to train the search space.
 
 * During search space training the entire collection of subnets is automatically trained together

@@ -12,7 +12,7 @@ Cache Diffusion is a technique that reuses cached outputs from previous diffusio
 | :------------: | :------------: | :------------: | :------------: |
 | Pre-Requisites | Required & optional packages to use this technique | \[[Link](#pre-requisites)\] | |
 | Getting Started | Learn how to optimize your models using quantization/cache diffusion to reduce precision and improve inference efficiency | \[[Link](#getting-started)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
-| Support Matrix | View the support matrix to see quantization/cahce diffusion compatibility and feature availability across different models | \[[Link](#support-matrix)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
+| Support Matrix | View the support matrix to see quantization/cache diffusion compatibility and feature availability across different models | \[[Link](#support-matrix)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
 | Cache Diffusion | Caching technique to accelerate inference without compromising quality | \[[Link](#cache-diffusion)\] | |
 | Post Training Quantization (PTQ) | Example scripts on how to run PTQ on diffusion models | \[[Link](#post-training-quantization-ptq)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
 | Quantization Aware Training (QAT) | Example scripts on how to run QAT on diffusion models | \[[Link](#quantization-aware-training-qat)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
@@ -389,7 +389,7 @@ This simple code demonstrates how to evaluate images generated by diffusion (or
 ### Install Requirements
 
 ```bash
-pip install -r eval/requirments.txt
+pip install -r eval/requirements.txt
 ```
 
 ### Data Format

@@ -23,22 +23,22 @@
    "id": "c3f7f931-ac38-494e-aea8-ca2cd6d05794",
    "metadata": {},
    "source": [
-    "## Installing Prerequisites and Dependancies"
+    "## Installing Prerequisites and Dependencies"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "d7d4f25f-e569-42cf-8022-bb7cc6f9ea6e",
    "metadata": {},
    "source": [
-    "If you haven't already, install the required dependencies for this notebook. Key dependancies include:\n",
+    "If you haven't already, install the required dependencies for this notebook. Key dependencies include:\n",
     "\n",
     "- nvidia-modelopt\n",
     "- torch\n",
     "- transformers\n",
     "- jupyterlab\n",
     "\n",
-    "This repo contains a `examples/llm_qat/notebooks/requirements.txt` file that can be used to install all required dependancies."
+    "This repo contains a `examples/llm_qat/notebooks/requirements.txt` file that can be used to install all required dependencies."
    ]
   },
   {
@@ -414,7 +414,7 @@
    "id": "e471ef6c-1346-4e5e-8782-5e9f2bc38f8a",
    "metadata": {},
    "source": [
-    "Once you have quantized the model you can now start the post-training process for QAT.  The training process will calculate validation loss at 50 step intervals and save the model. These can be controled by adjusting the `eval_steps` and `output_dir` above along with other `training_args`."
+    "Once you have quantized the model you can now start the post-training process for QAT.  The training process will calculate validation loss at 50 step intervals and save the model. These can be controlled by adjusting the `eval_steps` and `output_dir` above along with other `training_args`."
    ]
   },
   {
@@ -799,7 +799,7 @@
    "id": "eca645fb-d8d2-4c98-9cb2-afbae7d30d6c",
    "metadata": {},
    "source": [
-    "## Stop the TensorRT-LLM Docker contrainer"
+    "## Stop the TensorRT-LLM Docker container"
    ]
   },
   {

@@ -2,7 +2,7 @@
 
 ## Installation
 
-This benchmark is meant to be a lightweight layer ontop of an existing vLLM/SGLang/TRTLLM installation. For example, no install
+This benchmark is meant to be a lightweight layer on top of an existing vLLM/SGLang/TRTLLM installation. For example, no install
 is required if one is running in the following dockers: `vllm/vllm-openai:v0.11.0` (vLLM), `lmsysorg/sglang:v0.5.4.post2` (SGLang), or
 `nvcr.io/nvidia/tensorrt-llm/release:1.2.0` (TRT-LLM).
 

@@ -89,7 +89,7 @@ For large models, you can export intermediate hidden states to disk and train on
 
 ### Dumpping Hidden States to Disk
 
-We support two backends for generating base model hidden states. For better effciency, it is recommended to use TRT-LLM:
+We support two backends for generating base model hidden states. For better efficiency, it is recommended to use TRT-LLM:
 
 ```bash
 python collect_hidden_states/compute_hidden_states_trtllm.py \