fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization#28202
Conversation
The update_opset_version helper already auto-bumps opset to 19 when float8 quantization is requested on older models. Extend the same pattern to int16/uint16: when the user requests QUInt16 or QInt16 weight quantization and the model's opset is below 21, bump to 21 so that native ONNX QuantizeLinear/DequantizeLinear can be emitted instead of silently falling back to the com.microsoft contrib domain. Fixes microsoft#25223
There was a problem hiding this comment.
Pull request overview
Extends ONNX Runtime’s Python quantization utilities to automatically upgrade an input model’s ONNX opset to 21 when 16-bit integer (INT16/UINT16) QDQ quantization is requested, aligning behavior with the existing float8 opset auto-upgrade logic and adding regression tests.
Changes:
- Add an
update_opset_versionbranch to auto-bump opset< 21to21for INT16/UINT16 quantization types (with a warning). - Add unit tests validating opset 20 → 21 upgrade and no-op behavior at opset 21 for QUInt16/QInt16.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
onnxruntime/python/tools/quantization/quant_utils.py |
Adds opset auto-upgrade logic for INT16/UINT16 quantization to ensure ONNX-native QDQ compatibility. |
onnxruntime/test/python/quantization/test_quant_util.py |
Adds test coverage for the new opset upgrade behavior for 16-bit quantization types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
update_opset_version previously only inspected weight_type, so a config like activation_type=QInt16 with weight_type=QInt8 would not trigger the opset>=21 bump and could produce a model with int16 Q/DQ on opset<21. Extend the helper to accept activation_type and bump when either is INT16/UINT16. Update the quantize_static call site and add subtests covering 16-bit-activation-only, 16-bit-weight-only, both-8bit, and backward-compat (single-arg call) cases.
tianleiwu
left a comment
There was a problem hiding this comment.
Thanks for the follow-up on the activation-type gap. I found one remaining workflow issue: the config-based static quantization path can still carry an auto-derived UseQDQContribOps flag from the original opset, so it may keep emitting contrib-domain Q/DQ after the model is bumped to opset 21. Please align that path with the direct quantize_static behavior before merge.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… get_qdq_config get_qdq_config() was auto-setting extra_options["UseQDQContribOps"] = True whenever activation_type or weight_type was INT16/UINT16 and the model opset was < 21. This caused the config-based quantize(..., StaticQuantConfig) path to emit com.microsoft Q/DQ ops even after quantize_static() bumped the model to opset 21, where native ONNX QuantizeLinear/DequantizeLinear supports INT16/UINT16 natively. Narrow the condition so that UseQDQContribOps is only auto-set for 4-bit types (which have no opset bump) and for tensor-override-based types; 16-bit top-level weight/activation types are excluded because the opset-21 bump in quantize_static() already handles them. An explicit user-supplied UseQDQContribOps in extra_options still takes precedence via the existing override merge. Update test_get_qdq_config.py: rename and fix the int16-opset19 subtest to assert the new correct behavior (no contrib-ops flag), and add an end-to-end test that verifies the config path produces an opset-21 model with native-domain Q/DQ nodes. Tighten the existing no-op subtest in test_quant_util.py from assertNotEqual to assertEqual(result_opset, 20) for a stricter regression guard.
tianleiwu
left a comment
There was a problem hiding this comment.
Re-review after commit ef62c23
The main concern from the prior round — get_qdq_config() pre-setting UseQDQContribOps=True for 16-bit types before the opset bump — is now fixed. The narrowed condition correctly limits the auto-set to 4-bit types only, and the new end-to-end test (test_quantize_via_config_int16_opset_lt21_uses_native_qdq) validates that the config-based path produces opset-21 models with native ONNX Q/DQ.
One minor suggestion on the overrides check below.
LGTM — nice work addressing the feedback.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… UseQDQContribOps after opset bump - Extend opset-21 bump helper to inspect TensorQuantOverrides (including per-tensor convert.quant_type) for QInt16/QUInt16, so models with default 8-bit base types but 16-bit overrides also get the native opset-21 path. - Generalize the opset-bump warning text so it is accurate for both QDQ static and quantize_dynamic flows. - Recompute UseQDQContribOps after the opset bump so 16-bit/4-bit overrides no longer latch the model to com.microsoft Q/DQ post-bump. - Add regression tests for opset<21 + 16-bit overrides and mixed 16-bit/4-bit overrides via TensorQuantOverrides.
|
Thanks for the careful review. Pushed 1d4160a addressing all four points:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review feedback on the int16/uint16 QDQ opset auto-bump: - Wrap the TensorQuantOverrides scan loop in a try/except for (AttributeError, TypeError) so malformed input falls through to the existing TensorQuantOverridesHelper.is_valid() ValueError instead of raising an unrelated AttributeError on .get() calls. - Rename test_16bit_overrides_set_ms_domain to test_16bit_overrides_bump_opset_to_21 and flip its assertions to match the new behavior (opset bumped to 21, native ai.onnx Q/DQ). - Add test_16bit_convert_quant_type_bumps_opset_to_21 covering the convert.quant_type branch with an opset-20 model, ensuring the bump fires for the convert sub-dict path as well as top-level overrides.
|
Addressed the three review comments in d1ae1ab:
Full file passes 21/21. |
tianleiwu
left a comment
There was a problem hiding this comment.
Re-review after commit d1ae1ab
The previous concern from my earlier round (splitting the 16-bit vs 4-bit override checks in get_qdq_config) has been fully addressed by the refactored logic that separates will_bump_to_opset21 from needs_contrib_ops. Resolved that thread.
The overall approach is sound — auto-bumping opset to 21 for 16-bit types and removing the unnecessary UseQDQContribOps fallback is the right design. Test coverage is thorough with good edge-case scenarios (mixed 16-bit + 4-bit, convert.quant_type, end-to-end model verification).
One remaining item below on the silent except clause (also flagged by CodeQL).
Replaces an empty except block in the 16-bit opset bump heuristic with a logging.debug call so callers can observe when a structurally malformed TensorQuantOverrides causes the heuristic to be skipped. Addresses CodeQL 'Empty except' finding and review feedback on PR microsoft#28202.
|
Thanks for the catch. Pushed 140d631 which replaces the empty except (AttributeError, TypeError):
# Malformed overrides; structural validation is deferred to
# TensorQuantOverridesHelper.is_valid(). Skip bump heuristic.
logging.debug("Skipping 16-bit opset bump heuristic for TensorQuantOverrides: structure not as expected.")
|
Address review feedback on 16-bit QDQ opset bump: - Guard extra_options against None in quantize() call path - Use get_opset_version() helper for clearer test failures - Assert default ai.onnx domain for Q/DQ nodes - Extend get_qdq_config test to invoke quantize end-to-end and verify output opset==21 with default-domain Q/DQ ops
|
Thanks for the review. Pushed
|
tianleiwu
left a comment
There was a problem hiding this comment.
Re-review after commits 140d631 and da1271c
All concerns from prior review rounds have been addressed:
-
logging.debugfor malformed overrides — The bareexcept (AttributeError, TypeError): passis now replaced with alogging.debug()message. Resolves both my earlier comment and the CodeQL finding. -
extra_optionsNone guard —(extra_options or {}).get("TensorQuantOverrides")correctly handles the case whereextra_optionsmight beNone. -
Strengthened tests — End-to-end tests now run
quantize()and verify the output model opset is 21 with native ONNX domain Q/DQ nodes. Coverage forconvert.quant_typeand mixed 16-bit+4-bit overrides is solid.
The logic is consistent across update_opset_version() (scans quant_type and convert.quant_type in overrides) and get_qdq_config() (uses overrides_helper.get_quant_types() which checks the same fields). The will_bump_to_opset21 / needs_contrib_ops separation in get_qdq_config() is clean and correct.
LGTM.
Summary
update_opset_versionhelper to auto-bump opset from < 21 to 21 when QUInt16/QInt16 weight quantization is requestedMotivation
Fixes #25223.
Users exporting models from
torch.exportwith uint16/int16 quantization hit a gap where models below opset 21 were not being upgraded. Mirroring the existing float8 branch gives users a consistent, predictable upgrade path for 16-bit QDQ.Changes
onnxruntime/python/tools/quantization/quant_utils.py: newelifbranch inupdate_opset_versionthat bumps opset to 21 whenweight_quant_typeisINT16/UINT16and current opset is < 21. Emits a warning matching the existing float8 branch style.onnxruntime/test/python/quantization/test_quant_util.py: newtest_update_opset_version_16bitwith parametric subtests coveringQUInt16/QInt16bumping from opset 20 → 21 and a no-op regression check for models already at opset 21.Test Plan
All tests pass.
lintrunner -aproduces no changes.