Skip to content

[chore]: weekly bump of uv.lock on main (2026-03-30)#4

Open
github-actions[bot] wants to merge 1 commit intomainfrom
auto/bump-uv-lock-main-2026-03-30
Open

[chore]: weekly bump of uv.lock on main (2026-03-30)#4
github-actions[bot] wants to merge 1 commit intomainfrom
auto/bump-uv-lock-main-2026-03-30

Conversation

@github-actions
Copy link
Copy Markdown

Summary

Automated weekly update of uv.lock file for nSpect Scanning:

  • uv.lock — upgraded all transitive dependencies to latest compatible versions

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
noeyy-mino pushed a commit that referenced this pull request Mar 31, 2026
…ns (NVIDIA#1117)

### What does this PR do?

Type of change: BugFix

- ModelOpt was not placing Q/DQ nodes between Conv and
LayerNormalization, causing TensorRT to select slower i8f16 kernels
instead of faster i8i8 kernels for Conv layers whose output feeds into
LayerNorm (e.g., ConvNext models).

### Changes:
  - Register LayerNormalization in ORT's QDQ registry (ort_utils.py)
- Add find_conv_to_layernorm_nodes() to detect Conv→(copy ops)→LayerNorm
patterns (graph_utils.py)
- Add detected LayerNorm nodes to quantizable nodes list so Q/DQ pairs
are inserted on Conv output (int8.py)

### Usage

```python
  # No API change — existing quantize() call now automatically handles Conv->LayerNorm patterns
  import modelopt.onnx.quantization as moq

  moq.quantize("convnext.onnx", quantize_mode="int8")
  # Output model will now have: Conv -> Transpose -> Q -> DQ -> LayerNorm
````

### Testing

- Added unit test test_conv_layernorm_quantization with a ConvNext-like
test model (Conv→Transpose→LayerNorm→Transpose→Conv)
  - All 237 unit tests pass (236 existing + 1 new)
- Validated on ConvNext-tiny (opset 17): 23/23 LayerNorm nodes get Q/DQ
on activation input
- Built TRT engine in nvcr.io/nvidia/tensorrt:26.02-py3 — Conv layers
after LayerNorm select i8i8 kernels

### Before your PR is "*Ready for review*"

Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).

Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).

  - Is this change backward compatible?: ✅
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in CONTRIBUTING.md: N/A
  - Did you write any new necessary tests?: ✅
  - Did you update Changelog?: N/A

### Additional Information
  - NVBug: 5271237
  - JIRA: OMNIML-2380
- Note: The residual Add output quantization discussed in comments #4-9
of the bug is not addressed here and will be handled in a separate PR.
- Accuracy regression observed for the ConvNext model. Please have a
look at this
[comment](NVIDIA#1117 (comment))


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Expanded quantization to include LayerNormalization in Q/DQ flows and
detect Conv→LayerNormalization patterns for quantization.

* **Refactor**
* Improved internal graph-analysis helpers to make detection logic
reusable and more reliable.

* **Tests**
* Added a model builder for Conv→LayerNorm graphs and a unit test
validating end-to-end Conv→LayerNormalization quantization.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants