fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization by Rishi-Dave · Pull Request #28202 · microsoft/onnxruntime

Rishi-Dave · 2026-04-23T11:22:40Z

Summary

Extends existing update_opset_version helper to auto-bump opset from < 21 to 21 when QUInt16/QInt16 weight quantization is requested
Mirrored after the existing float8 quantization opset upgrade pattern
Adds test coverage with parametric subtests for 16-bit int quantization

Motivation

Users exporting models from torch.export with uint16/int16 quantization hit a gap where models below opset 21 were not being upgraded. Mirroring the existing float8 branch gives users a consistent, predictable upgrade path for 16-bit QDQ.

Changes

onnxruntime/python/tools/quantization/quant_utils.py: new elif branch in update_opset_version that bumps opset to 21 when weight_quant_type is INT16 / UINT16 and current opset is < 21. Emits a warning matching the existing float8 branch style.
onnxruntime/test/python/quantization/test_quant_util.py: new test_update_opset_version_16bit with parametric subtests covering QUInt16 / QInt16 bumping from opset 20 → 21 and a no-op regression check for models already at opset 21.

Test Plan

python -m pytest onnxruntime/test/python/quantization/test_quant_util.py -v

All tests pass. lintrunner -a produces no changes.

The update_opset_version helper already auto-bumps opset to 19 when float8 quantization is requested on older models. Extend the same pattern to int16/uint16: when the user requests QUInt16 or QInt16 weight quantization and the model's opset is below 21, bump to 21 so that native ONNX QuantizeLinear/DequantizeLinear can be emitted instead of silently falling back to the com.microsoft contrib domain. Fixes microsoft#25223

Copilot

Pull request overview

Extends ONNX Runtime’s Python quantization utilities to automatically upgrade an input model’s ONNX opset to 21 when 16-bit integer (INT16/UINT16) QDQ quantization is requested, aligning behavior with the existing float8 opset auto-upgrade logic and adding regression tests.

Changes:

Add an update_opset_version branch to auto-bump opset < 21 to 21 for INT16/UINT16 quantization types (with a warning).
Add unit tests validating opset 20 → 21 upgrade and no-op behavior at opset 21 for QUInt16/QInt16.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`onnxruntime/python/tools/quantization/quant_utils.py`	Adds opset auto-upgrade logic for INT16/UINT16 quantization to ensure ONNX-native QDQ compatibility.
`onnxruntime/test/python/quantization/test_quant_util.py`	Adds test coverage for the new opset upgrade behavior for 16-bit quantization types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

update_opset_version previously only inspected weight_type, so a config like activation_type=QInt16 with weight_type=QInt8 would not trigger the opset>=21 bump and could produce a model with int16 Q/DQ on opset<21. Extend the helper to accept activation_type and bump when either is INT16/UINT16. Update the quantize_static call site and add subtests covering 16-bit-activation-only, 16-bit-weight-only, both-8bit, and backward-compat (single-arg call) cases.

tianleiwu

Thanks for the follow-up on the activation-type gap. I found one remaining workflow issue: the config-based static quantization path can still carry an auto-derived UseQDQContribOps flag from the original opset, so it may keep emitting contrib-domain Q/DQ after the model is bumped to opset 21. Please align that path with the direct quantize_static behavior before merge.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… get_qdq_config get_qdq_config() was auto-setting extra_options["UseQDQContribOps"] = True whenever activation_type or weight_type was INT16/UINT16 and the model opset was < 21. This caused the config-based quantize(..., StaticQuantConfig) path to emit com.microsoft Q/DQ ops even after quantize_static() bumped the model to opset 21, where native ONNX QuantizeLinear/DequantizeLinear supports INT16/UINT16 natively. Narrow the condition so that UseQDQContribOps is only auto-set for 4-bit types (which have no opset bump) and for tensor-override-based types; 16-bit top-level weight/activation types are excluded because the opset-21 bump in quantize_static() already handles them. An explicit user-supplied UseQDQContribOps in extra_options still takes precedence via the existing override merge. Update test_get_qdq_config.py: rename and fix the int16-opset19 subtest to assert the new correct behavior (no contrib-ops flag), and add an end-to-end test that verifies the config path produces an opset-21 model with native-domain Q/DQ nodes. Tighten the existing no-op subtest in test_quant_util.py from assertNotEqual to assertEqual(result_opset, 20) for a stricter regression guard.

tianleiwu

Re-review after commit `ef62c23`

The main concern from the prior round — get_qdq_config() pre-setting UseQDQContribOps=True for 16-bit types before the opset bump — is now fixed. The narrowed condition correctly limits the auto-set to 4-bit types only, and the new end-to-end test (test_quantize_via_config_int16_opset_lt21_uses_native_qdq) validates that the config-based path produces opset-21 models with native ONNX Q/DQ.

One minor suggestion on the overrides check below.

LGTM — nice work addressing the feedback.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… UseQDQContribOps after opset bump - Extend opset-21 bump helper to inspect TensorQuantOverrides (including per-tensor convert.quant_type) for QInt16/QUInt16, so models with default 8-bit base types but 16-bit overrides also get the native opset-21 path. - Generalize the opset-bump warning text so it is accurate for both QDQ static and quantize_dynamic flows. - Recompute UseQDQContribOps after the opset bump so 16-bit/4-bit overrides no longer latch the model to com.microsoft Q/DQ post-bump. - Add regression tests for opset<21 + 16-bit overrides and mixed 16-bit/4-bit overrides via TensorQuantOverrides.

Rishi-Dave · 2026-05-04T11:50:35Z

Thanks for the careful review. Pushed 1d4160a addressing all four points:

quant_utils.py opset-bump helper now also scans TensorQuantOverrides (including per-tensor convert.quant_type) for QInt16/QUInt16, so models with default 8-bit base types but 16-bit overrides take the native opset-21 path instead of falling back to contrib Q/DQ.
Generalized the opset-bump warning wording so it reads correctly for quantize_dynamic() callers, not just QDQ static.
quantize.py now recomputes UseQDQContribOps after the opset bump, so 16-bit/4-bit override-driven bumps no longer latch the model to com.microsoft Q/DQ.
Added two regression tests in test_get_qdq_config.py: one for opset<21 + QUInt16 TensorQuantOverrides (asserts opset==21 and Q/DQ in default domain), and one for the mixed 16-bit + 4-bit override case.

lintrunner -a clean, all 10 test_get_qdq_config.py tests pass locally.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address review feedback on the int16/uint16 QDQ opset auto-bump: - Wrap the TensorQuantOverrides scan loop in a try/except for (AttributeError, TypeError) so malformed input falls through to the existing TensorQuantOverridesHelper.is_valid() ValueError instead of raising an unrelated AttributeError on .get() calls. - Rename test_16bit_overrides_set_ms_domain to test_16bit_overrides_bump_opset_to_21 and flip its assertions to match the new behavior (opset bumped to 21, native ai.onnx Q/DQ). - Add test_16bit_convert_quant_type_bumps_opset_to_21 covering the convert.quant_type branch with an opset-20 model, ensuring the bump fires for the convert sub-dict path as well as top-level overrides.

Rishi-Dave · 2026-05-05T12:01:20Z

Addressed the three review comments in d1ae1ab:

quant_utils.py: wrapped the TensorQuantOverrides scan loop in try/except (AttributeError, TypeError): pass so malformed input (non-list values, non-dict convert entries) falls through to TensorQuantOverridesHelper.is_valid() and raises the original deterministic ValueError. KeyError cannot occur (all reads use .get()), and ValueError is intentionally not caught.
test_tensor_quant_overrides_option.py: renamed test_16bit_overrides_set_ms_domain to test_16bit_overrides_bump_opset_to_21 and flipped its assertions to expect ai.onnx opset >= 21 with native-domain Q/DQ.
Added test_16bit_convert_quant_type_bumps_opset_to_21 covering the convert.quant_type branch specifically: top-level quant_type=QUInt8, convert={'quant_type': QInt16}, opset=20 input → bumped to >=21 with native Q/DQ.

Full file passes 21/21. lintrunner clean.

tianleiwu

Re-review after commit `d1ae1ab`

The previous concern from my earlier round (splitting the 16-bit vs 4-bit override checks in get_qdq_config) has been fully addressed by the refactored logic that separates will_bump_to_opset21 from needs_contrib_ops. Resolved that thread.

The overall approach is sound — auto-bumping opset to 21 for 16-bit types and removing the unnecessary UseQDQContribOps fallback is the right design. Test coverage is thorough with good edge-case scenarios (mixed 16-bit + 4-bit, convert.quant_type, end-to-end model verification).

One remaining item below on the silent except clause (also flagged by CodeQL).

Replaces an empty except block in the 16-bit opset bump heuristic with a logging.debug call so callers can observe when a structurally malformed TensorQuantOverrides causes the heuristic to be skipped. Addresses CodeQL 'Empty except' finding and review feedback on PR microsoft#28202.

Rishi-Dave · 2026-05-11T11:26:48Z

Thanks for the catch. Pushed 140d631 which replaces the empty except (AttributeError, TypeError): pass block in quant_utils.py with your suggested snippet:

except (AttributeError, TypeError):
    # Malformed overrides; structural validation is deferred to
    # TensorQuantOverridesHelper.is_valid(). Skip bump heuristic.
    logging.debug("Skipping 16-bit opset bump heuristic for TensorQuantOverrides: structure not as expected.")

import logging was already present in the module so no new imports were needed. lintrunner -a is clean. This should resolve the CodeQL "Empty except" finding as well.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Address review feedback on 16-bit QDQ opset bump: - Guard extra_options against None in quantize() call path - Use get_opset_version() helper for clearer test failures - Assert default ai.onnx domain for Q/DQ nodes - Extend get_qdq_config test to invoke quantize end-to-end and verify output opset==21 with default-domain Q/DQ ops

Rishi-Dave · 2026-05-12T11:15:58Z

Thanks for the review. Pushed da1271c8 addressing all four points:

quantize_static() — guarded extra_options.get("TensorQuantOverrides") with (extra_options or {}).get(...) so the path is safe when extra_options=None.
test_tensor_quant_overrides_option.py — replaced the bare next(...) opset lookup with the production get_opset_version() helper, so a missing ai.onnx import raises a clear ValueError instead of StopIteration. Applied to all three 16-bit opset-bump tests.
Same file — tightened the Q/DQ domain assertion from assertNotEqual(node.domain, ms_domain) to assertEqual(node.domain, "") with an informative failure message.
test_get_qdq_config.py — extended test_overrides_mixed_16bit_4bit_opset_lt21_no_contrib_ops to actually run quantize(...), load the output model, and assert opset ≥ 21 with all Q/DQ nodes in the default ONNX domain.

tianleiwu

Re-review after commits `140d631` and `da1271c`

All concerns from prior review rounds have been addressed:

logging.debug for malformed overrides — The bare except (AttributeError, TypeError): pass is now replaced with a logging.debug() message. Resolves both my earlier comment and the CodeQL finding.
extra_options None guard — (extra_options or {}).get("TensorQuantOverrides") correctly handles the case where extra_options might be None.
Strengthened tests — End-to-end tests now run quantize() and verify the output model opset is 21 with native ONNX domain Q/DQ nodes. Coverage for convert.quant_type and mixed 16-bit+4-bit overrides is solid.

The logic is consistent across update_opset_version() (scans quant_type and convert.quant_type in overrides) and get_qdq_config() (uses overrides_helper.get_quant_types() which checks the same fields). The will_bump_to_opset21 / needs_contrib_ops separation in get_qdq_config() is clean and correct.

LGTM.

tianleiwu requested a review from Copilot April 25, 2026 16:44

Copilot started reviewing on behalf of tianleiwu April 25, 2026 16:45 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated

tianleiwu requested changes May 3, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quantize.py Outdated

tianleiwu requested a review from Copilot May 3, 2026 00:42

Copilot started reviewing on behalf of tianleiwu May 3, 2026 00:43 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py

Comment thread onnxruntime/python/tools/quantization/quantize.py Outdated

Comment thread onnxruntime/test/python/quantization/test_quant_util.py Outdated

tianleiwu previously approved these changes May 3, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quantize.py Outdated

tianleiwu requested a review from Copilot May 3, 2026 17:22

Copilot started reviewing on behalf of tianleiwu May 3, 2026 17:23 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py

Comment thread onnxruntime/python/tools/quantization/quantize.py Outdated

Comment thread onnxruntime/python/tools/quantization/quant_utils.py

Comment thread onnxruntime/test/python/quantization/test_get_qdq_config.py

Rishi-Dave dismissed tianleiwu’s stale review via 1d4160a May 4, 2026 11:50

tianleiwu requested a review from Copilot May 4, 2026 14:05

Copilot started reviewing on behalf of tianleiwu May 4, 2026 14:07 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated

Comment thread onnxruntime/python/tools/quantization/quant_utils.py

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated

github-advanced-security AI found potential problems May 7, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Fixed

tianleiwu reviewed May 7, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py

tianleiwu requested a review from Copilot May 11, 2026 17:01

Copilot AI reviewed May 11, 2026

View reviewed changes

Copilot started reviewing on behalf of tianleiwu May 11, 2026 17:40 View session

tianleiwu approved these changes May 12, 2026

View reviewed changes

tianleiwu enabled auto-merge (squash) May 14, 2026 17:21

tianleiwu merged commit 0ebd8cc into microsoft:main May 14, 2026
88 checks passed

Conversation

Rishi-Dave commented Apr 23, 2026

Summary

Motivation

Changes

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Re-review after commit ef62c23

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rishi-Dave commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rishi-Dave commented May 5, 2026

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Re-review after commit d1ae1ab

Uh oh!

Uh oh!

Rishi-Dave commented May 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rishi-Dave commented May 12, 2026

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Re-review after commits 140d631 and da1271c

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Re-review after commit `ef62c23`

Re-review after commit `d1ae1ab`

Re-review after commits `140d631` and `da1271c`