fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229
fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229Rishi-Dave wants to merge 1 commit intomicrosoft:mainfrom
Conversation
ONNX QLinearConv reuses the int32 bias values directly without re-scaling, which requires bias_scale[i] == x_scale * w_scale[i] for each output channel. Previously ConvNodeGroupSelector::Check did not verify this invariant, so a QDQ model whose bias DQ used an arbitrary scale (e.g. produced by a non-conformant quantizer) would be silently fused into QLinearConv with wrong outputs. Add a static helper CheckConvBiasScale() that reads the three scale initializers, handles both per-tensor (scalar) and per-channel weight scales, and rejects the fusion when the tolerance check |b_scale - x_scale*w_scale| > atol + rtol*|x_scale*w_scale| (atol=1e-6, rtol=1e-2) fails for any channel. If any scale is not a constant initializer the helper returns false conservatively. Add two Python test cases in TestConvBiasScaleValidation: - test_mismatched_bias_scale_skips_fusion: verifies that optimized and unoptimized sessions agree when bias_scale is 2x the correct value, proving fusion was safely skipped. - test_matching_bias_scale_allows_fusion: verifies the same agreement when bias_scale is exact, ensuring no regression on valid models. Fixes microsoft#24711
There was a problem hiding this comment.
Pull request overview
This PR hardens the QDQ Conv → QLinearConv fusion in the QDQ transformer by validating that the bias dequantization scale conforms to the ONNX QLinearConv requirement (bias_scale == x_scale * w_scale[i]), preventing silent wrong results when models use non-canonical bias scales.
Changes:
- Add
CheckConvBiasScalevalidation toConvNodeGroupSelector::Checkto skip fusion when bias scale does not matchx_scale * w_scale[i]within tolerance. - Add Python tests that run the same model with optimizations enabled vs disabled to detect bias-scale-related corruption.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc |
Adds bias scale validation gate to prevent incorrect QDQ Conv fusion into QLinearConv. |
onnxruntime/test/python/quantization/test_qdq.py |
Adds regression tests for matched vs mismatched bias scales under different optimization levels. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const float x_scale = x_scales[0]; | ||
| const size_t num_channels = w_scales.size(); // 1 for per-tensor, C_out for per-channel | ||
| const size_t b_num = b_scales.size(); | ||
|
|
||
| // b_scale must be scalar or match num_channels. | ||
| if (b_num != 1 && b_num != num_channels) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
CheckConvBiasScale derives num_channels from w_scales.size(), which is 1 for per-tensor weight scales even when the Conv has multiple output channels. This makes the loop validate only one channel and also rejects valid cases where b_scale is a per-channel vector (length C_out) but w_scale is scalar (all channels share the same expected bias scale). Consider deriving the channel count from the per-channel scale vector when present (e.g., use b_scales.size() when >1, else w_scales.size()), and when w_scales.size()==1 compare each channel against w_scales[0] while allowing b_scale to be either scalar or length C_out.
| def test_matching_bias_scale_allows_fusion(self): | ||
| """When bias_scale == input_scale * weight_scale, fusion should be allowed. | ||
|
|
||
| Both optimized and unoptimized sessions must produce identical outputs. | ||
| """ | ||
| inp_shape = [1, 1, 4, 4] | ||
| weight_shape = [1, 1, 1, 1] | ||
|
|
There was a problem hiding this comment.
These tests only compare optimized vs unoptimized outputs; they don’t assert whether fusion actually occurred (or was skipped). That can lead to false positives if fusion is disabled for an unrelated reason (both tests would still pass). Consider saving the optimized model (e.g., via SessionOptions.optimized_model_filepath) and asserting QLinearConv is present for the matching-scale case and absent for the mismatched-scale case (or use existing op-count helpers) so the tests directly validate the fusion decision.
Summary
CheckConvBiasScalevalidator insideConvNodeGroupSelector::CheckMotivation
Fixes #24711.
The ONNX QLinearConv spec requires the int32 bias to use scale
x_scale × w_scale[i]so the fused kernel can reuse it directly. The current QDQ selector only verifies the bias dtype is INT32 — it never checks that the bias DQ's scale satisfies this relationship. When a model is constructed with an arbitrary bias scale (e.g. user-supplied or from a non-canonical quantizer), the selector still fuses the subgraph and the QLinearConv kernel produces silently wrong outputs atORT_ENABLE_EXTENDEDand above on CPU EP. CUDA and disabled-optimization paths produce correct results, making the bug particularly hard to diagnose.Changes
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc: addCheckConvBiasScalestatic helper. Returnsfalse(skip fusion) when:x_scaleis not a scalar / 1-element rank-1 tensorb_scalelength is neither 1 nornum_channelsx_scale × w_scale[i]by more thanatol=1e-6 + rtol=1e-2 × |expected|onnxruntime/test/python/quantization/test_qdq.py: newTestConvBiasScaleValidationclass with two cases — mismatched bias scale (asserts optimized output matches unoptimized) and matching bias scale (asserts correctness preserved when fusion is allowed).Test Plan
python -m pytest onnxruntime/test/python/quantization/test_qdq.py::TestConvBiasScaleValidation -vverify_quantize_convfamily) should continue to pass — fusion is unchanged for canonical quantizer-produced models where bias_scale equals input_scale × weight_scale exactly.ORT_ENABLE_ALLoutput now matchesORT_DISABLE_ALL.