Fix CPU QLinearConv: support per-channel weight zero points with distinct values by Copilot · Pull Request #28456 · microsoft/onnxruntime

Copilot · 2026-05-11T17:24:08Z

Description

The CPU QLinearConv kernel incorrectly rejected per-channel weight zero point tensors whose values were not all identical, even though the ONNX spec allows this for asymmetric per-channel quantization.

Kernel (qlinearconv.cc):

Removed the ORT_ENFORCE in ComputeOffset that required all per-channel W zero points to be equal
Extracted W zero point reading from ComputeOffset into Compute() directly, exposing the full per-channel array
Added W_zero_point_is_per_channel / W_zero_point_is_uniform flags
GEMM path: sets PerColumnZeroPoints = true and passes W_zero_point_data + group_id * group_output_channels when ZPs differ — MLAS already supported this
Depthwise path: requires uniform W zero points (since MlasConvDepthwise takes a scalar FilterZeroPoint); non-uniform per-channel ZPs automatically fall back to the group-GEMM path instead

Tests (qlinearconv_op_test.cc):

Added zero_points_ vector field to QuantizedTensor and SetWeightZeroPoints() method to QLinearConvOpTester
Updated ComputeExpectedOutput and Run() to emit a per-channel ZP tensor when set
Added three new test cases covering uint8 activations, int8 activations, and grouped convolution with per-channel W zero points

Motivation and Context

CPUExecutionProvider threw QLinearConv : zero point of per-channel filter must be same at runtime for any model using asymmetric per-channel weight quantization (distinct zero points per output channel), despite w_scale and w_zp both being valid 1-D [Cout] tensors per the ONNX spec. This made a common quantization pattern completely unusable on CPU.

w_zp = np.array([5, 90], dtype=np.uint8)  # different per-channel ZPs → was rejected

…inct values Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/b3963943-a627-4094-b987-a75ad1b49f62 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

github-actions

You can commit the suggested changes from lintrunner.

tianleiwu

Kernel-side routing looks correct overall; the remaining gap is regression coverage around the new depthwise fallback.

Copilot

Pull request overview

This PR fixes CPUExecutionProvider’s QLinearConv to correctly accept asymmetric per-channel weight zero points where each output channel can have a distinct zero point value (as allowed by the ONNX spec). It updates the CPU kernel to pass full per-channel zero-point arrays into the MLAS GEMM path when needed, and adds/extends unit tests to cover these scenarios.

Changes:

CPU kernel: removes the incorrect enforcement that all per-channel w_zero_point values must be identical; plumbs per-channel zero points into the GEMM path and restricts the depthwise fast-path to uniform zero points.
Tests: extends the QLinearConvOpTester to optionally emit per-channel weight zero-point tensors and validates per-channel ZPs across u8/u8, s8/s8, and grouped convolutions.
Adds a depthwise-shaped test case intended to validate fallback behavior when per-channel zero points are non-uniform.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
onnxruntime/core/providers/cpu/quantization/qlinearconv.cc	Allows per-channel non-uniform weight zero points and routes depthwise vs GEMM based on whether weight ZPs are uniform.
onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc	Adds per-channel weight zero-point support to the test harness and new test cases covering the bug scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…epthwise fallback test PerColumnZeroPoints so that uniform per-channel ZPs use the faster scalar MLAS path. - Rename Conv2D_S8S8_Depthwise_PerChannelZeroPoints to Conv2D_S8S8_DepthwiseFallback_PerChannelZeroPoints and add a comment clarifying it validates the group-GEMM fallback path.

…nv-per-channel-zero-points

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

…ce, simplify casts

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

+  const auto* W_zero_point_data = static_cast<const uint8_t*>(W_zero_point->DataRaw());
+  // Per-channel zero points are uniform when size == 1 or all values match.
+  const bool W_zero_point_is_uniform =
+      (W_zero_point_size <= 1) ||
+      std::all_of(W_zero_point_data + 1, W_zero_point_data + W_zero_point_size,
+                  [W_zero_point_data](uint8_t v) { return v == W_zero_point_data[0]; });
+  // When non-uniform, w_zero_point must be a full per-channel tensor of size M
+  // so that group_id * group_output_channels indexing is in bounds.
+  ORT_ENFORCE(W_zero_point_is_uniform || W_zero_point_size == M,
+              "QLinearConv : non-uniform weight zero point tensor size (", W_zero_point_size,
+              ") must equal number of output channels (", M, ")");


              gemm_params.B = reordered_W + group_id * group_output_channels,
              gemm_params.ldb = static_cast<size_t>(M);
            }
-            gemm_params.ZeroPointB = &W_zero_point_value;
+            gemm_params.ZeroPointB = !W_zero_point_is_uniform
+                                         ? W_zero_point_data + group_id * group_output_channels
+                                         : &W_zero_point_value;
+            gemm_params.PerColumnZeroPoints = !W_zero_point_is_uniform;


+// Depthwise config (groups == channels) with non-uniform per-channel weight zero points.
+// The kernel cannot use MlasConvDepthwise with distinct ZPs, so this validates the
+// automatic fallback to the group-GEMM path.
+TEST(QLinearConvTest, Conv2D_S8S8_DepthwiseFallback_PerChannelZeroPoints) {


Initial plan

c025244

Copilot AI assigned Copilot and tianleiwu May 11, 2026

Copilot started work on behalf of tianleiwu May 11, 2026 17:24 View session

Copilot AI linked an issue May 11, 2026 that may be closed by this pull request

CPU QLinearConv rejects per-channel weight zero points with different values #28447

Open

Fix CPU QLinearConv: support per-channel weight zero points with dist…

9d1021c

…inct values Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/b3963943-a627-4094-b987-a75ad1b49f62 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix CPU QLinearConv for per-channel weight zero points~~ Fix CPU QLinearConv: support per-channel weight zero points with distinct values May 11, 2026

Copilot finished work on behalf of tianleiwu May 11, 2026 18:19

Copilot AI requested a review from tianleiwu May 11, 2026 18:19

github-actions Bot reviewed May 12, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/quantization/qlinearconv.cc Outdated

tianleiwu reviewed May 12, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc

address feedback

748a937

tianleiwu marked this pull request as ready for review May 13, 2026 06:42

tianleiwu requested a review from Copilot May 13, 2026 06:44

Copilot started reviewing on behalf of tianleiwu May 13, 2026 06:45 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/quantization/qlinearconv.cc Outdated

Comment thread onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc Outdated

tianleiwu requested a review from Copilot May 14, 2026 21:04

Merge remote-tracking branch 'origin/main' into copilot/fix-qlinearco…

1f1a094

…nv-per-channel-zero-points

Copilot AI reviewed May 14, 2026

View reviewed changes

Copilot started reviewing on behalf of tianleiwu May 14, 2026 21:13 View session

address review feedback: inline per-channel flag, add defensive enfor…

db6ea25

…ce, simplify casts

tianleiwu requested a review from Copilot May 14, 2026 22:18

Copilot AI reviewed May 14, 2026

View reviewed changes

Copilot started reviewing on behalf of tianleiwu May 14, 2026 22:32 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CPU QLinearConv: support per-channel weight zero points with distinct values#28456

Fix CPU QLinearConv: support per-channel weight zero points with distinct values#28456
Copilot wants to merge 6 commits into
mainfrom
copilot/fix-qlinearconv-per-channel-zero-points

Copilot AI commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 11, 2026 •

edited

Loading