[CoreML EP] Support Gather with scalar indices on the MLProgram path#28478
Closed
john-rocky wants to merge 1 commit into
Closed
[CoreML EP] Support Gather with scalar indices on the MLProgram path#28478john-rocky wants to merge 1 commit into
john-rocky wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds CoreML MLProgram-path support for ONNX Gather with scalar (rank-0) indices by rewriting it inside the CoreML builder to avoid CPU fallback and partition splits (fixes #28180).
Changes:
- MLProgram path: detect scalar
indicesand emitreshape([1]) -> gather -> squeeze(axes=[axis]). - Support check: allow scalar
indicesonly on MLProgram and adjust effective output-rank calculation accordingly. - Minor refactor: use
input_defs/output_defslocals and pass logger through.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
edgchen1
reviewed
May 14, 2026
…ogram path Fixes microsoft#28180. GatherOpBuilder rejected ONNX Gather with rank-0 'indices', forcing a CPU partition split. PyTorch's exporter emits this routinely - e.g. GFPGAN 1024x1024 has 16 per-layer style-code Gathers with a scalar constant index, splitting the CoreML subgraph in two. Building a small MIL program (mb.gather with a rank-0 mb.const indices) end-to-end against iOS15 confirms MIL gather accepts rank-0 indices and drops the gathered axis - matching ONNX semantics: main[CoreML3](%x: (1, 16, 512, fp32)) { %gather_0: (1, 512, fp32) = gather(x=%x, indices=3, axis=1) } The original rejection was driven by ORT-side boundary handling, not by MIL: RegisterModelInputOutput rewrites any rank-0 boundary tensor to {1} (MLMultiArray has no rank-0), and on the NN path RegisterInitializers does the same for rank-0 initializers (LoadConstantND requires rank >= 1). Constant initializers on the MLProgram path go through OnnxTensorToCoreMLTensor which preserves rank, so MIL gather can consume them directly. Therefore: allow scalar 'indices' on the MLProgram path only when it is a constant initializer. NN path and non-initializer scalar 'indices' remain rejected. AddToModelBuilderImpl is unchanged. Existing CPU OpTester cases cover the shape and now stay on CoreML EP: - GatherOpTest.Gather_axis0_scalar_indices - GatherOpTest.Gather_axis1_scalar_indices
b5b7db9 to
3a4af38
Compare
Author
|
Apologies — I missed @maxwbuckley's earlier #28278 when looking at this issue. Their PR covers both the NN and MLProgram paths and is already in review with @yuslepukhin, so closing this in favor of theirs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #28180.
PyTorch's ONNX exporter routinely emits
Gatherwith a rank-0 (scalar)indicesinput — every per-layer style-code lookup in StyleGAN / StyleGAN2 derivatives is one. In the GFPGAN 1024×1024 generator that's 16 such Gathers, each splitting the CoreML subgraph and forcing a partition boundary.GatherOpBuilder::IsOpSupportedImplhistorically rejected this shape outright.The actual blocker isn't MIL gather
MIL
gatherdoes accept rank-0indicesand produces the expected output (gathered axis dropped). Verified by building a small MIL program end-to-end against iOS 15:The reason the EP rejected the case was ORT-side boundary handling, not MIL:
ModelBuilder::RegisterModelInputOutputrewrites any rank-0 boundary tensor to{1}becauseMLMultiArrayhas no rank-0 representation.RegisterInitializersdoes the same for rank-0 initializers (LoadConstantNDrequires rank ≥ 1).But constant initializers on the MLProgram path flow through
OnnxTensorToCoreMLTensor, which preserves rank — so MILgathercan consume them directly.Change
IsOpSupportedImpl: allow rank-0indiceson the MLProgram path only when it is a constant initializer. NeuralNetwork path and non-initializer scalarindicesremain rejected.AddToModelBuilderImpl: unchanged frommain— the existing code already passesindicesstraight through to MILgather.Test
Existing CPU
OpTestercases already cover the shape and run against every registered EP:GatherOpTest.Gather_axis0_scalar_indices(data[2,2,2], scalar index, axis 0)GatherOpTest.Gather_axis1_scalar_indices(data[2,2,2], scalar index, axis 1)Both were skipped on CoreML before (
IsOpSupportedImplreturned false); with this change they stay on CoreML EP and the OpTester output matches the reference.