Skip to content

fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization#28202

Merged
tianleiwu merged 7 commits into
microsoft:mainfrom
Rishi-Dave:rishidave/feat/quant-utils-opset-bump-16bit
May 14, 2026
Merged

fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization#28202
tianleiwu merged 7 commits into
microsoft:mainfrom
Rishi-Dave:rishidave/feat/quant-utils-opset-bump-16bit

Conversation

@Rishi-Dave
Copy link
Copy Markdown
Contributor

Summary

  • Extends existing update_opset_version helper to auto-bump opset from < 21 to 21 when QUInt16/QInt16 weight quantization is requested
  • Mirrored after the existing float8 quantization opset upgrade pattern
  • Adds test coverage with parametric subtests for 16-bit int quantization

Motivation

Fixes #25223.

Users exporting models from torch.export with uint16/int16 quantization hit a gap where models below opset 21 were not being upgraded. Mirroring the existing float8 branch gives users a consistent, predictable upgrade path for 16-bit QDQ.

Changes

  • onnxruntime/python/tools/quantization/quant_utils.py: new elif branch in update_opset_version that bumps opset to 21 when weight_quant_type is INT16 / UINT16 and current opset is < 21. Emits a warning matching the existing float8 branch style.
  • onnxruntime/test/python/quantization/test_quant_util.py: new test_update_opset_version_16bit with parametric subtests covering QUInt16 / QInt16 bumping from opset 20 → 21 and a no-op regression check for models already at opset 21.

Test Plan

python -m pytest onnxruntime/test/python/quantization/test_quant_util.py -v

All tests pass. lintrunner -a produces no changes.

The update_opset_version helper already auto-bumps opset to 19 when
float8 quantization is requested on older models. Extend the same
pattern to int16/uint16: when the user requests QUInt16 or QInt16
weight quantization and the model's opset is below 21, bump to 21 so
that native ONNX QuantizeLinear/DequantizeLinear can be emitted
instead of silently falling back to the com.microsoft contrib domain.

Fixes microsoft#25223
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends ONNX Runtime’s Python quantization utilities to automatically upgrade an input model’s ONNX opset to 21 when 16-bit integer (INT16/UINT16) QDQ quantization is requested, aligning behavior with the existing float8 opset auto-upgrade logic and adding regression tests.

Changes:

  • Add an update_opset_version branch to auto-bump opset < 21 to 21 for INT16/UINT16 quantization types (with a warning).
  • Add unit tests validating opset 20 → 21 upgrade and no-op behavior at opset 21 for QUInt16/QInt16.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
onnxruntime/python/tools/quantization/quant_utils.py Adds opset auto-upgrade logic for INT16/UINT16 quantization to ensure ONNX-native QDQ compatibility.
onnxruntime/test/python/quantization/test_quant_util.py Adds test coverage for the new opset upgrade behavior for 16-bit quantization types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated
update_opset_version previously only inspected weight_type, so a config
like activation_type=QInt16 with weight_type=QInt8 would not trigger the
opset>=21 bump and could produce a model with int16 Q/DQ on opset<21.
Extend the helper to accept activation_type and bump when either is
INT16/UINT16. Update the quantize_static call site and add subtests
covering 16-bit-activation-only, 16-bit-weight-only, both-8bit, and
backward-compat (single-arg call) cases.
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up on the activation-type gap. I found one remaining workflow issue: the config-based static quantization path can still carry an auto-derived UseQDQContribOps flag from the original opset, so it may keep emitting contrib-domain Q/DQ after the model is bumped to opset 21. Please align that path with the direct quantize_static behavior before merge.

Comment thread onnxruntime/python/tools/quantization/quantize.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/python/tools/quantization/quant_utils.py
Comment thread onnxruntime/python/tools/quantization/quantize.py Outdated
Comment thread onnxruntime/test/python/quantization/test_quant_util.py Outdated
… get_qdq_config

get_qdq_config() was auto-setting extra_options["UseQDQContribOps"] = True
whenever activation_type or weight_type was INT16/UINT16 and the model opset
was < 21. This caused the config-based quantize(..., StaticQuantConfig) path
to emit com.microsoft Q/DQ ops even after quantize_static() bumped the model
to opset 21, where native ONNX QuantizeLinear/DequantizeLinear supports
INT16/UINT16 natively.

Narrow the condition so that UseQDQContribOps is only auto-set for 4-bit types
(which have no opset bump) and for tensor-override-based types; 16-bit top-level
weight/activation types are excluded because the opset-21 bump in quantize_static()
already handles them. An explicit user-supplied UseQDQContribOps in extra_options
still takes precedence via the existing override merge.

Update test_get_qdq_config.py: rename and fix the int16-opset19 subtest to assert
the new correct behavior (no contrib-ops flag), and add an end-to-end test that
verifies the config path produces an opset-21 model with native-domain Q/DQ nodes.
Tighten the existing no-op subtest in test_quant_util.py from assertNotEqual to
assertEqual(result_opset, 20) for a stricter regression guard.
tianleiwu
tianleiwu previously approved these changes May 3, 2026
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after commit ef62c23

The main concern from the prior round — get_qdq_config() pre-setting UseQDQContribOps=True for 16-bit types before the opset bump — is now fixed. The narrowed condition correctly limits the auto-set to 4-bit types only, and the new end-to-end test (test_quantize_via_config_int16_opset_lt21_uses_native_qdq) validates that the config-based path produces opset-21 models with native ONNX Q/DQ.

One minor suggestion on the overrides check below.

LGTM — nice work addressing the feedback.

Comment thread onnxruntime/python/tools/quantization/quantize.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/python/tools/quantization/quant_utils.py
Comment thread onnxruntime/python/tools/quantization/quantize.py Outdated
Comment thread onnxruntime/python/tools/quantization/quant_utils.py
Comment thread onnxruntime/test/python/quantization/test_get_qdq_config.py
… UseQDQContribOps after opset bump

- Extend opset-21 bump helper to inspect TensorQuantOverrides (including
  per-tensor convert.quant_type) for QInt16/QUInt16, so models with default
  8-bit base types but 16-bit overrides also get the native opset-21 path.
- Generalize the opset-bump warning text so it is accurate for both QDQ
  static and quantize_dynamic flows.
- Recompute UseQDQContribOps after the opset bump so 16-bit/4-bit overrides
  no longer latch the model to com.microsoft Q/DQ post-bump.
- Add regression tests for opset<21 + 16-bit overrides and mixed 16-bit/4-bit
  overrides via TensorQuantOverrides.
@Rishi-Dave
Copy link
Copy Markdown
Contributor Author

Thanks for the careful review. Pushed 1d4160a addressing all four points:

  • quant_utils.py opset-bump helper now also scans TensorQuantOverrides (including per-tensor convert.quant_type) for QInt16/QUInt16, so models with default 8-bit base types but 16-bit overrides take the native opset-21 path instead of falling back to contrib Q/DQ.
  • Generalized the opset-bump warning wording so it reads correctly for quantize_dynamic() callers, not just QDQ static.
  • quantize.py now recomputes UseQDQContribOps after the opset bump, so 16-bit/4-bit override-driven bumps no longer latch the model to com.microsoft Q/DQ.
  • Added two regression tests in test_get_qdq_config.py: one for opset<21 + QUInt16 TensorQuantOverrides (asserts opset==21 and Q/DQ in default domain), and one for the mixed 16-bit + 4-bit override case.

lintrunner -a clean, all 10 test_get_qdq_config.py tests pass locally.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated
Comment thread onnxruntime/python/tools/quantization/quant_utils.py
Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated
Address review feedback on the int16/uint16 QDQ opset auto-bump:

- Wrap the TensorQuantOverrides scan loop in a try/except for
  (AttributeError, TypeError) so malformed input falls through to the
  existing TensorQuantOverridesHelper.is_valid() ValueError instead of
  raising an unrelated AttributeError on .get() calls.
- Rename test_16bit_overrides_set_ms_domain to
  test_16bit_overrides_bump_opset_to_21 and flip its assertions to
  match the new behavior (opset bumped to 21, native ai.onnx Q/DQ).
- Add test_16bit_convert_quant_type_bumps_opset_to_21 covering the
  convert.quant_type branch with an opset-20 model, ensuring the bump
  fires for the convert sub-dict path as well as top-level overrides.
@Rishi-Dave
Copy link
Copy Markdown
Contributor Author

Addressed the three review comments in d1ae1ab:

  • quant_utils.py: wrapped the TensorQuantOverrides scan loop in try/except (AttributeError, TypeError): pass so malformed input (non-list values, non-dict convert entries) falls through to TensorQuantOverridesHelper.is_valid() and raises the original deterministic ValueError. KeyError cannot occur (all reads use .get()), and ValueError is intentionally not caught.
  • test_tensor_quant_overrides_option.py: renamed test_16bit_overrides_set_ms_domain to test_16bit_overrides_bump_opset_to_21 and flipped its assertions to expect ai.onnx opset >= 21 with native-domain Q/DQ.
  • Added test_16bit_convert_quant_type_bumps_opset_to_21 covering the convert.quant_type branch specifically: top-level quant_type=QUInt8, convert={'quant_type': QInt16}, opset=20 input → bumped to >=21 with native Q/DQ.

Full file passes 21/21. lintrunner clean.

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Fixed
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after commit d1ae1ab

The previous concern from my earlier round (splitting the 16-bit vs 4-bit override checks in get_qdq_config) has been fully addressed by the refactored logic that separates will_bump_to_opset21 from needs_contrib_ops. Resolved that thread.

The overall approach is sound — auto-bumping opset to 21 for 16-bit types and removing the unnecessary UseQDQContribOps fallback is the right design. Test coverage is thorough with good edge-case scenarios (mixed 16-bit + 4-bit, convert.quant_type, end-to-end model verification).

One remaining item below on the silent except clause (also flagged by CodeQL).

Comment thread onnxruntime/python/tools/quantization/quant_utils.py
Replaces an empty except block in the 16-bit opset bump heuristic with
a logging.debug call so callers can observe when a structurally
malformed TensorQuantOverrides causes the heuristic to be skipped.
Addresses CodeQL 'Empty except' finding and review feedback on PR
microsoft#28202.
@Rishi-Dave
Copy link
Copy Markdown
Contributor Author

Thanks for the catch. Pushed 140d631 which replaces the empty except (AttributeError, TypeError): pass block in quant_utils.py with your suggested snippet:

except (AttributeError, TypeError):
    # Malformed overrides; structural validation is deferred to
    # TensorQuantOverridesHelper.is_valid(). Skip bump heuristic.
    logging.debug("Skipping 16-bit opset bump heuristic for TensorQuantOverrides: structure not as expected.")

import logging was already present in the module so no new imports were needed. lintrunner -a is clean. This should resolve the CodeQL "Empty except" finding as well.

@tianleiwu tianleiwu requested a review from Copilot May 11, 2026 17:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment thread onnxruntime/python/tools/quantization/quantize.py
Comment thread onnxruntime/test/python/quantization/test_tensor_quant_overrides_option.py Outdated
Comment thread onnxruntime/test/python/quantization/test_tensor_quant_overrides_option.py Outdated
Comment thread onnxruntime/test/python/quantization/test_get_qdq_config.py
Address review feedback on 16-bit QDQ opset bump:
- Guard extra_options against None in quantize() call path
- Use get_opset_version() helper for clearer test failures
- Assert default ai.onnx domain for Q/DQ nodes
- Extend get_qdq_config test to invoke quantize end-to-end
  and verify output opset==21 with default-domain Q/DQ ops
@Rishi-Dave
Copy link
Copy Markdown
Contributor Author

Thanks for the review. Pushed da1271c8 addressing all four points:

  1. quantize_static() — guarded extra_options.get("TensorQuantOverrides") with (extra_options or {}).get(...) so the path is safe when extra_options=None.
  2. test_tensor_quant_overrides_option.py — replaced the bare next(...) opset lookup with the production get_opset_version() helper, so a missing ai.onnx import raises a clear ValueError instead of StopIteration. Applied to all three 16-bit opset-bump tests.
  3. Same file — tightened the Q/DQ domain assertion from assertNotEqual(node.domain, ms_domain) to assertEqual(node.domain, "") with an informative failure message.
  4. test_get_qdq_config.py — extended test_overrides_mixed_16bit_4bit_opset_lt21_no_contrib_ops to actually run quantize(...), load the output model, and assert opset ≥ 21 with all Q/DQ nodes in the default ONNX domain.

Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after commits 140d631 and da1271c

All concerns from prior review rounds have been addressed:

  1. logging.debug for malformed overrides — The bare except (AttributeError, TypeError): pass is now replaced with a logging.debug() message. Resolves both my earlier comment and the CodeQL finding.

  2. extra_options None guard(extra_options or {}).get("TensorQuantOverrides") correctly handles the case where extra_options might be None.

  3. Strengthened tests — End-to-end tests now run quantize() and verify the output model opset is 21 with native ONNX domain Q/DQ nodes. Coverage for convert.quant_type and mixed 16-bit+4-bit overrides is solid.

The logic is consistent across update_opset_version() (scans quant_type and convert.quant_type in overrides) and get_qdq_config() (uses overrides_helper.get_quant_types() which checks the same fields). The will_bump_to_opset21 / needs_contrib_ops separation in get_qdq_config() is clean and correct.

LGTM.

@tianleiwu tianleiwu enabled auto-merge (squash) May 14, 2026 17:21
@tianleiwu tianleiwu merged commit 0ebd8cc into microsoft:main May 14, 2026
88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] support U16 / S16 for QuantizeLinear and DequantizeLinear

4 participants