fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion by Rishi-Dave · Pull Request #28229 · microsoft/onnxruntime

Rishi-Dave · 2026-04-25T11:39:25Z

Summary

Add CheckConvBiasScale validator inside ConvNodeGroupSelector::Check
Skip QDQ Conv → QLinearConv fusion when bias DQ scale ≠ input_scale × weight_scale (within 1% relative tolerance)
Adds Python test coverage for both matching and mismatched bias scales

Motivation

The ONNX QLinearConv spec requires the int32 bias to use scale x_scale × w_scale[i] so the fused kernel can reuse it directly. The current QDQ selector only verifies the bias dtype is INT32 — it never checks that the bias DQ's scale satisfies this relationship. When a model is constructed with an arbitrary bias scale (e.g. user-supplied or from a non-canonical quantizer), the selector still fuses the subgraph and the QLinearConv kernel produces silently wrong outputs at ORT_ENABLE_EXTENDED and above on CPU EP. CUDA and disabled-optimization paths produce correct results, making the bug particularly hard to diagnose.

Changes

onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc: add CheckConvBiasScale static helper. Returns false (skip fusion) when:
- any of x/w/b scales is not a constant initializer
- any scale dtype is not float32
- x_scale is not a scalar / 1-element rank-1 tensor
- b_scale length is neither 1 nor num_channels
- any per-channel bias scale differs from x_scale × w_scale[i] by more than atol=1e-6 + rtol=1e-2 × |expected|
onnxruntime/test/python/quantization/test_qdq.py: new TestConvBiasScaleValidation class with two cases — mismatched bias scale (asserts optimized output matches unoptimized) and matching bias scale (asserts correctness preserved when fusion is allowed).

Test Plan

python -m pytest onnxruntime/test/python/quantization/test_qdq.py::TestConvBiasScaleValidation -v
Existing QDQ Conv tests (verify_quantize_conv family) should continue to pass — fusion is unchanged for canonical quantizer-produced models where bias_scale equals input_scale × weight_scale exactly.
Reproduce the issue with the model from Conv + Q/DQ produces bad output in CPUExecutionProvider #24711 and confirm CPU ORT_ENABLE_ALL output now matches ORT_DISABLE_ALL.

ONNX QLinearConv reuses the int32 bias values directly without re-scaling, which requires bias_scale[i] == x_scale * w_scale[i] for each output channel. Previously ConvNodeGroupSelector::Check did not verify this invariant, so a QDQ model whose bias DQ used an arbitrary scale (e.g. produced by a non-conformant quantizer) would be silently fused into QLinearConv with wrong outputs. Add a static helper CheckConvBiasScale() that reads the three scale initializers, handles both per-tensor (scalar) and per-channel weight scales, and rejects the fusion when the tolerance check |b_scale - x_scale*w_scale| > atol + rtol*|x_scale*w_scale| (atol=1e-6, rtol=1e-2) fails for any channel. If any scale is not a constant initializer the helper returns false conservatively. Add two Python test cases in TestConvBiasScaleValidation: - test_mismatched_bias_scale_skips_fusion: verifies that optimized and unoptimized sessions agree when bias_scale is 2x the correct value, proving fusion was safely skipped. - test_matching_bias_scale_allows_fusion: verifies the same agreement when bias_scale is exact, ensuring no regression on valid models. Fixes microsoft#24711

Copilot

Pull request overview

This PR hardens the QDQ Conv → QLinearConv fusion in the QDQ transformer by validating that the bias dequantization scale conforms to the ONNX QLinearConv requirement (bias_scale == x_scale * w_scale[i]), preventing silent wrong results when models use non-canonical bias scales.

Changes:

Add CheckConvBiasScale validation to ConvNodeGroupSelector::Check to skip fusion when bias scale does not match x_scale * w_scale[i] within tolerance.
Add Python tests that run the same model with optimizations enabled vs disabled to detect bias-scale-related corruption.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc`	Adds bias scale validation gate to prevent incorrect QDQ Conv fusion into QLinearConv.
`onnxruntime/test/python/quantization/test_qdq.py`	Adds regression tests for matched vs mismatched bias scales under different optimization levels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T16:48:23Z

+  const float x_scale = x_scales[0];
+  const size_t num_channels = w_scales.size();  // 1 for per-tensor, C_out for per-channel
+  const size_t b_num = b_scales.size();
+
+  // b_scale must be scalar or match num_channels.
+  if (b_num != 1 && b_num != num_channels) {
+    return false;
+  }


CheckConvBiasScale derives num_channels from w_scales.size(), which is 1 for per-tensor weight scales even when the Conv has multiple output channels. This makes the loop validate only one channel and also rejects valid cases where b_scale is a per-channel vector (length C_out) but w_scale is scalar (all channels share the same expected bias scale). Consider deriving the channel count from the per-channel scale vector when present (e.g., use b_scales.size() when >1, else w_scales.size()), and when w_scales.size()==1 compare each channel against w_scales[0] while allowing b_scale to be either scalar or length C_out.

Copilot · 2026-04-25T16:48:23Z

+    def test_matching_bias_scale_allows_fusion(self):
+        """When bias_scale == input_scale * weight_scale, fusion should be allowed.
+
+        Both optimized and unoptimized sessions must produce identical outputs.
+        """
+        inp_shape = [1, 1, 4, 4]
+        weight_shape = [1, 1, 1, 1]
+


These tests only compare optimized vs unoptimized outputs; they don’t assert whether fusion actually occurred (or was skipped). That can lead to false positives if fusion is disabled for an unrelated reason (both tests would still pass). Consider saving the optimized model (e.g., via SessionOptions.optimized_model_filepath) and asserting QLinearConv is present for the matching-scale case and absent for the mismatched-scale case (or use existing op-count helpers) so the tests directly validate the fusion decision.

tianleiwu requested a review from Copilot April 25, 2026 16:42

Copilot started reviewing on behalf of tianleiwu April 25, 2026 16:43 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229

fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229
Rishi-Dave wants to merge 1 commit intomicrosoft:mainfrom
Rishi-Dave:rishidave/fix/qdq-conv-bias-scale-validation

Rishi-Dave commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Rishi-Dave commented Apr 25, 2026

Summary

Motivation

Changes

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants