Skip to content

Fix CPU QLinearConv: support per-channel weight zero points with distinct values#28456

Open
Copilot wants to merge 6 commits into
mainfrom
copilot/fix-qlinearconv-per-channel-zero-points
Open

Fix CPU QLinearConv: support per-channel weight zero points with distinct values#28456
Copilot wants to merge 6 commits into
mainfrom
copilot/fix-qlinearconv-per-channel-zero-points

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 11, 2026

Description

The CPU QLinearConv kernel incorrectly rejected per-channel weight zero point tensors whose values were not all identical, even though the ONNX spec allows this for asymmetric per-channel quantization.

Kernel (qlinearconv.cc):

  • Removed the ORT_ENFORCE in ComputeOffset that required all per-channel W zero points to be equal
  • Extracted W zero point reading from ComputeOffset into Compute() directly, exposing the full per-channel array
  • Added W_zero_point_is_per_channel / W_zero_point_is_uniform flags
  • GEMM path: sets PerColumnZeroPoints = true and passes W_zero_point_data + group_id * group_output_channels when ZPs differ — MLAS already supported this
  • Depthwise path: requires uniform W zero points (since MlasConvDepthwise takes a scalar FilterZeroPoint); non-uniform per-channel ZPs automatically fall back to the group-GEMM path instead

Tests (qlinearconv_op_test.cc):

  • Added zero_points_ vector field to QuantizedTensor and SetWeightZeroPoints() method to QLinearConvOpTester
  • Updated ComputeExpectedOutput and Run() to emit a per-channel ZP tensor when set
  • Added three new test cases covering uint8 activations, int8 activations, and grouped convolution with per-channel W zero points

Motivation and Context

CPUExecutionProvider threw QLinearConv : zero point of per-channel filter must be same at runtime for any model using asymmetric per-channel weight quantization (distinct zero points per output channel), despite w_scale and w_zp both being valid 1-D [Cout] tensors per the ONNX spec. This made a common quantization pattern completely unusable on CPU.

w_zp = np.array([5, 90], dtype=np.uint8)  # different per-channel ZPs → was rejected

Copilot AI changed the title [WIP] Fix CPU QLinearConv for per-channel weight zero points Fix CPU QLinearConv: support per-channel weight zero points with distinct values May 11, 2026
Copilot AI requested a review from tianleiwu May 11, 2026 18:19
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/cpu/quantization/qlinearconv.cc Outdated
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel-side routing looks correct overall; the remaining gap is regression coverage around the new depthwise fallback.

Comment thread onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc
@tianleiwu tianleiwu marked this pull request as ready for review May 13, 2026 06:42
@tianleiwu tianleiwu requested a review from Copilot May 13, 2026 06:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes CPUExecutionProvider’s QLinearConv to correctly accept asymmetric per-channel weight zero points where each output channel can have a distinct zero point value (as allowed by the ONNX spec). It updates the CPU kernel to pass full per-channel zero-point arrays into the MLAS GEMM path when needed, and adds/extends unit tests to cover these scenarios.

Changes:

  • CPU kernel: removes the incorrect enforcement that all per-channel w_zero_point values must be identical; plumbs per-channel zero points into the GEMM path and restricts the depthwise fast-path to uniform zero points.
  • Tests: extends the QLinearConvOpTester to optionally emit per-channel weight zero-point tensors and validates per-channel ZPs across u8/u8, s8/s8, and grouped convolutions.
  • Adds a depthwise-shaped test case intended to validate fallback behavior when per-channel zero points are non-uniform.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/core/providers/cpu/quantization/qlinearconv.cc Allows per-channel non-uniform weight zero points and routes depthwise vs GEMM based on whether weight ZPs are uniform.
onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc Adds per-channel weight zero-point support to the test harness and new test cases covering the bug scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/quantization/qlinearconv.cc Outdated
Comment thread onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc Outdated
…epthwise fallback test

  PerColumnZeroPoints so that uniform per-channel ZPs use the faster scalar
  MLAS path.
- Rename Conv2D_S8S8_Depthwise_PerChannelZeroPoints to
  Conv2D_S8S8_DepthwiseFallback_PerChannelZeroPoints and add a comment
  clarifying it validates the group-GEMM fallback path.
@tianleiwu tianleiwu requested a review from Copilot May 14, 2026 21:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Comment thread onnxruntime/core/providers/cpu/quantization/qlinearconv.cc Outdated
Comment thread onnxruntime/core/providers/cpu/quantization/qlinearconv.cc
Comment thread onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc Outdated
Comment thread onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc
@tianleiwu tianleiwu requested a review from Copilot May 14, 2026 22:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment on lines +529 to +539
const auto* W_zero_point_data = static_cast<const uint8_t*>(W_zero_point->DataRaw());
// Per-channel zero points are uniform when size == 1 or all values match.
const bool W_zero_point_is_uniform =
(W_zero_point_size <= 1) ||
std::all_of(W_zero_point_data + 1, W_zero_point_data + W_zero_point_size,
[W_zero_point_data](uint8_t v) { return v == W_zero_point_data[0]; });
// When non-uniform, w_zero_point must be a full per-channel tensor of size M
// so that group_id * group_output_channels indexing is in bounds.
ORT_ENFORCE(W_zero_point_is_uniform || W_zero_point_size == M,
"QLinearConv : non-uniform weight zero point tensor size (", W_zero_point_size,
") must equal number of output channels (", M, ")");
Comment on lines 983 to +989
gemm_params.B = reordered_W + group_id * group_output_channels,
gemm_params.ldb = static_cast<size_t>(M);
}
gemm_params.ZeroPointB = &W_zero_point_value;
gemm_params.ZeroPointB = !W_zero_point_is_uniform
? W_zero_point_data + group_id * group_output_channels
: &W_zero_point_value;
gemm_params.PerColumnZeroPoints = !W_zero_point_is_uniform;
Comment on lines +1604 to +1607
// Depthwise config (groups == channels) with non-uniform per-channel weight zero points.
// The kernel cannot use MlasConvDepthwise with distinct ZPs, so this validates the
// automatic fallback to the group-GEMM path.
TEST(QLinearConvTest, Conv2D_S8S8_DepthwiseFallback_PerChannelZeroPoints) {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CPU QLinearConv rejects per-channel weight zero points with different values

3 participants