Fix CPU QLinearConv: support per-channel weight zero points with distinct values#28456
Fix CPU QLinearConv: support per-channel weight zero points with distinct values#28456Copilot wants to merge 6 commits into
Conversation
…inct values Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/b3963943-a627-4094-b987-a75ad1b49f62 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
tianleiwu
left a comment
There was a problem hiding this comment.
Kernel-side routing looks correct overall; the remaining gap is regression coverage around the new depthwise fallback.
There was a problem hiding this comment.
Pull request overview
This PR fixes CPUExecutionProvider’s QLinearConv to correctly accept asymmetric per-channel weight zero points where each output channel can have a distinct zero point value (as allowed by the ONNX spec). It updates the CPU kernel to pass full per-channel zero-point arrays into the MLAS GEMM path when needed, and adds/extends unit tests to cover these scenarios.
Changes:
- CPU kernel: removes the incorrect enforcement that all per-channel
w_zero_pointvalues must be identical; plumbs per-channel zero points into the GEMM path and restricts the depthwise fast-path to uniform zero points. - Tests: extends the
QLinearConvOpTesterto optionally emit per-channel weight zero-point tensors and validates per-channel ZPs across u8/u8, s8/s8, and grouped convolutions. - Adds a depthwise-shaped test case intended to validate fallback behavior when per-channel zero points are non-uniform.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| onnxruntime/core/providers/cpu/quantization/qlinearconv.cc | Allows per-channel non-uniform weight zero points and routes depthwise vs GEMM based on whether weight ZPs are uniform. |
| onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc | Adds per-channel weight zero-point support to the test harness and new test cases covering the bug scenario. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…epthwise fallback test PerColumnZeroPoints so that uniform per-channel ZPs use the faster scalar MLAS path. - Rename Conv2D_S8S8_Depthwise_PerChannelZeroPoints to Conv2D_S8S8_DepthwiseFallback_PerChannelZeroPoints and add a comment clarifying it validates the group-GEMM fallback path.
…nv-per-channel-zero-points
…ce, simplify casts
| const auto* W_zero_point_data = static_cast<const uint8_t*>(W_zero_point->DataRaw()); | ||
| // Per-channel zero points are uniform when size == 1 or all values match. | ||
| const bool W_zero_point_is_uniform = | ||
| (W_zero_point_size <= 1) || | ||
| std::all_of(W_zero_point_data + 1, W_zero_point_data + W_zero_point_size, | ||
| [W_zero_point_data](uint8_t v) { return v == W_zero_point_data[0]; }); | ||
| // When non-uniform, w_zero_point must be a full per-channel tensor of size M | ||
| // so that group_id * group_output_channels indexing is in bounds. | ||
| ORT_ENFORCE(W_zero_point_is_uniform || W_zero_point_size == M, | ||
| "QLinearConv : non-uniform weight zero point tensor size (", W_zero_point_size, | ||
| ") must equal number of output channels (", M, ")"); |
| gemm_params.B = reordered_W + group_id * group_output_channels, | ||
| gemm_params.ldb = static_cast<size_t>(M); | ||
| } | ||
| gemm_params.ZeroPointB = &W_zero_point_value; | ||
| gemm_params.ZeroPointB = !W_zero_point_is_uniform | ||
| ? W_zero_point_data + group_id * group_output_channels | ||
| : &W_zero_point_value; | ||
| gemm_params.PerColumnZeroPoints = !W_zero_point_is_uniform; |
| // Depthwise config (groups == channels) with non-uniform per-channel weight zero points. | ||
| // The kernel cannot use MlasConvDepthwise with distinct ZPs, so this validates the | ||
| // automatic fallback to the group-GEMM path. | ||
| TEST(QLinearConvTest, Conv2D_S8S8_DepthwiseFallback_PerChannelZeroPoints) { |
Description
The CPU
QLinearConvkernel incorrectly rejected per-channel weight zero point tensors whose values were not all identical, even though the ONNX spec allows this for asymmetric per-channel quantization.Kernel (
qlinearconv.cc):ORT_ENFORCEinComputeOffsetthat required all per-channel W zero points to be equalComputeOffsetintoCompute()directly, exposing the full per-channel arrayW_zero_point_is_per_channel/W_zero_point_is_uniformflagsPerColumnZeroPoints = trueand passesW_zero_point_data + group_id * group_output_channelswhen ZPs differ — MLAS already supported thisMlasConvDepthwisetakes a scalarFilterZeroPoint); non-uniform per-channel ZPs automatically fall back to the group-GEMM path insteadTests (
qlinearconv_op_test.cc):zero_points_vector field toQuantizedTensorandSetWeightZeroPoints()method toQLinearConvOpTesterComputeExpectedOutputandRun()to emit a per-channel ZP tensor when setuint8activations,int8activations, and grouped convolution with per-channel W zero pointsMotivation and Context
CPUExecutionProviderthrewQLinearConv : zero point of per-channel filter must be sameat runtime for any model using asymmetric per-channel weight quantization (distinct zero points per output channel), despitew_scaleandw_zpboth being valid 1-D[Cout]tensors per the ONNX spec. This made a common quantization pattern completely unusable on CPU.