Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ NVIDIA Model Optimizer Changelog
- NeMo and Megatron-LM distributed checkpoint (``torch-dist``) stored with legacy version can no longer be loaded. The remedy is to load the legacy distributed checkpoint with 0.29 and store a ``torch`` checkpoint and resume with 0.31 to convert to a new format. The following changes only apply to storing and resuming distributed checkpoint.
- ``quantizer_state`` of :class:`TensorQuantizer <modelopt.torch.quantization.nn.modules.TensorQuantizer>` is now stored in ``extra_state`` of :class:`QuantModule <modelopt.torch.quantization.nn.module.QuantModule>` where it used to be stored in the sharded ``modelopt_state``.
- The dtype and shape of ``amax`` and ``pre_quant_scale`` stored in the distributed checkpoint are now restored. Some dtype and shape are previously changed to make all decoder layers to have homogeneous structure in the checkpoint.
- Together with megatron.core-0.13, quantized model will store and resume distributed checkpoint in a heterogenous format.
- Together with megatron.core-0.13, quantized model will store and resume distributed checkpoint in a heterogeneous format.
- auto_quantize API now accepts a list of quantization config dicts as the list of quantization choices.
- This API previously accepts a list of strings of quantization format names. It was therefore limited to only pre-defined quantization formats unless through some hacks.
- With this change, now user can easily use their own custom quantization formats for auto_quantize.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/guides/7_nas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -570,7 +570,7 @@ NAS-based training
During training of an search space, we simultaneously train both the model's weights and
architecture:

* Using :mod:`modelopt.torch.nas<modelopt.torch.nas>` you can re-use your existing
* Using :mod:`modelopt.torch.nas<modelopt.torch.nas>` you can reuse your existing
training loop to train the search space.

* During search space training the entire collection of subnets is automatically trained together
Expand Down
4 changes: 2 additions & 2 deletions examples/diffusers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Cache Diffusion is a technique that reuses cached outputs from previous diffusio
| :------------: | :------------: | :------------: | :------------: |
| Pre-Requisites | Required & optional packages to use this technique | \[[Link](#pre-requisites)\] | |
| Getting Started | Learn how to optimize your models using quantization/cache diffusion to reduce precision and improve inference efficiency | \[[Link](#getting-started)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
| Support Matrix | View the support matrix to see quantization/cahce diffusion compatibility and feature availability across different models | \[[Link](#support-matrix)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
| Support Matrix | View the support matrix to see quantization/cache diffusion compatibility and feature availability across different models | \[[Link](#support-matrix)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
| Cache Diffusion | Caching technique to accelerate inference without compromising quality | \[[Link](#cache-diffusion)\] | |
| Post Training Quantization (PTQ) | Example scripts on how to run PTQ on diffusion models | \[[Link](#post-training-quantization-ptq)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
| Quantization Aware Training (QAT) | Example scripts on how to run QAT on diffusion models | \[[Link](#quantization-aware-training-qat)\] | \[[docs](https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html)\] |
Expand Down Expand Up @@ -389,7 +389,7 @@ This simple code demonstrates how to evaluate images generated by diffusion (or
### Install Requirements

```bash
pip install -r eval/requirments.txt
pip install -r eval/requirements.txt
```

### Data Format
Expand Down
10 changes: 5 additions & 5 deletions examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,22 +23,22 @@
"id": "c3f7f931-ac38-494e-aea8-ca2cd6d05794",
"metadata": {},
"source": [
"## Installing Prerequisites and Dependancies"
"## Installing Prerequisites and Dependencies"
]
},
{
"cell_type": "markdown",
"id": "d7d4f25f-e569-42cf-8022-bb7cc6f9ea6e",
"metadata": {},
"source": [
"If you haven't already, install the required dependencies for this notebook. Key dependancies include:\n",
"If you haven't already, install the required dependencies for this notebook. Key dependencies include:\n",
"\n",
"- nvidia-modelopt\n",
"- torch\n",
"- transformers\n",
"- jupyterlab\n",
"\n",
"This repo contains a `examples/llm_qat/notebooks/requirements.txt` file that can be used to install all required dependancies."
"This repo contains a `examples/llm_qat/notebooks/requirements.txt` file that can be used to install all required dependencies."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor grammar tweak in dependency sentence.

At Line 41, “a examples/... file” should be “an examples/... file” (or “the examples/... file”).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb` at line 41, Update the
sentence string "This repo contains a
`examples/llm_qat/notebooks/requirements.txt` file that can be used to install
all required dependencies." to use correct article: replace "a" with "an" (or
optionally "the") so it reads "This repo contains an
`examples/llm_qat/notebooks/requirements.txt` file..." ensuring the exact quoted
string in the notebook cell is changed accordingly.

]
},
{
Expand Down Expand Up @@ -414,7 +414,7 @@
"id": "e471ef6c-1346-4e5e-8782-5e9f2bc38f8a",
"metadata": {},
"source": [
"Once you have quantized the model you can now start the post-training process for QAT. The training process will calculate validation loss at 50 step intervals and save the model. These can be controled by adjusting the `eval_steps` and `output_dir` above along with other `training_args`."
"Once you have quantized the model you can now start the post-training process for QAT. The training process will calculate validation loss at 50 step intervals and save the model. These can be controlled by adjusting the `eval_steps` and `output_dir` above along with other `training_args`."
]
},
{
Expand Down Expand Up @@ -799,7 +799,7 @@
"id": "eca645fb-d8d2-4c98-9cb2-afbae7d30d6c",
"metadata": {},
"source": [
"## Stop the TensorRT-LLM Docker contrainer"
"## Stop the TensorRT-LLM Docker container"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion examples/specdec_bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Installation

This benchmark is meant to be a lightweight layer ontop of an existing vLLM/SGLang/TRTLLM installation. For example, no install
This benchmark is meant to be a lightweight layer on top of an existing vLLM/SGLang/TRTLLM installation. For example, no install
is required if one is running in the following dockers: `vllm/vllm-openai:v0.11.0` (vLLM), `lmsysorg/sglang:v0.5.4.post2` (SGLang), or
`nvcr.io/nvidia/tensorrt-llm/release:1.2.0` (TRT-LLM).

Expand Down
2 changes: 1 addition & 1 deletion examples/speculative_decoding/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ For large models, you can export intermediate hidden states to disk and train on

### Dumpping Hidden States to Disk

We support two backends for generating base model hidden states. For better effciency, it is recommended to use TRT-LLM:
We support two backends for generating base model hidden states. For better efficiency, it is recommended to use TRT-LLM:

```bash
python collect_hidden_states/compute_hidden_states_trtllm.py \
Expand Down
Loading