Skip to content

[CoreML EP] Support Gather with scalar indices on the MLProgram path#28478

Closed
john-rocky wants to merge 1 commit into
microsoft:mainfrom
john-rocky:fix/28180-gather-scalar-indices
Closed

[CoreML EP] Support Gather with scalar indices on the MLProgram path#28478
john-rocky wants to merge 1 commit into
microsoft:mainfrom
john-rocky:fix/28180-gather-scalar-indices

Conversation

@john-rocky
Copy link
Copy Markdown

@john-rocky john-rocky commented May 12, 2026

Fixes #28180.

PyTorch's ONNX exporter routinely emits Gather with a rank-0 (scalar) indices input — every per-layer style-code lookup in StyleGAN / StyleGAN2 derivatives is one. In the GFPGAN 1024×1024 generator that's 16 such Gathers, each splitting the CoreML subgraph and forcing a partition boundary. GatherOpBuilder::IsOpSupportedImpl historically rejected this shape outright.

The actual blocker isn't MIL gather

MIL gather does accept rank-0 indices and produces the expected output (gathered axis dropped). Verified by building a small MIL program end-to-end against iOS 15:

@mb.program(input_specs=[mb.TensorSpec(shape=(1, 16, 512), dtype=types.fp32)])
def prog(x):
    return mb.gather(x=x, indices=mb.const(val=np.int32(3)), axis=1)
main[CoreML3](%x: (1, 16, 512, fp32)) {
  %gather_0: (1, 512, fp32) = gather(x=%x, indices=3, axis=1)
}

The reason the EP rejected the case was ORT-side boundary handling, not MIL:

  • ModelBuilder::RegisterModelInputOutput rewrites any rank-0 boundary tensor to {1} because MLMultiArray has no rank-0 representation.
  • On the NeuralNetwork path, RegisterInitializers does the same for rank-0 initializers (LoadConstantND requires rank ≥ 1).

But constant initializers on the MLProgram path flow through OnnxTensorToCoreMLTensor, which preserves rank — so MIL gather can consume them directly.

Change

IsOpSupportedImpl: allow rank-0 indices on the MLProgram path only when it is a constant initializer. NeuralNetwork path and non-initializer scalar indices remain rejected.

AddToModelBuilderImpl: unchanged from main — the existing code already passes indices straight through to MIL gather.

Test

Existing CPU OpTester cases already cover the shape and run against every registered EP:

  • GatherOpTest.Gather_axis0_scalar_indices (data [2,2,2], scalar index, axis 0)
  • GatherOpTest.Gather_axis1_scalar_indices (data [2,2,2], scalar index, axis 1)

Both were skipped on CoreML before (IsOpSupportedImpl returned false); with this change they stay on CoreML EP and the OpTester output matches the reference.

@edgchen1 edgchen1 requested a review from Copilot May 14, 2026 17:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds CoreML MLProgram-path support for ONNX Gather with scalar (rank-0) indices by rewriting it inside the CoreML builder to avoid CPU fallback and partition splits (fixes #28180).

Changes:

  • MLProgram path: detect scalar indices and emit reshape([1]) -> gather -> squeeze(axes=[axis]).
  • Support check: allow scalar indices only on MLProgram and adjust effective output-rank calculation accordingly.
  • Minor refactor: use input_defs / output_defs locals and pass logger through.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc Outdated
…ogram path

Fixes microsoft#28180. GatherOpBuilder rejected ONNX Gather with rank-0 'indices',
forcing a CPU partition split. PyTorch's exporter emits this routinely - e.g.
GFPGAN 1024x1024 has 16 per-layer style-code Gathers with a scalar constant
index, splitting the CoreML subgraph in two.

Building a small MIL program (mb.gather with a rank-0 mb.const indices)
end-to-end against iOS15 confirms MIL gather accepts rank-0 indices and drops
the gathered axis - matching ONNX semantics:

  main[CoreML3](%x: (1, 16, 512, fp32)) {
    %gather_0: (1, 512, fp32) = gather(x=%x, indices=3, axis=1)
  }

The original rejection was driven by ORT-side boundary handling, not by MIL:
RegisterModelInputOutput rewrites any rank-0 boundary tensor to {1}
(MLMultiArray has no rank-0), and on the NN path RegisterInitializers does
the same for rank-0 initializers (LoadConstantND requires rank >= 1).
Constant initializers on the MLProgram path go through
OnnxTensorToCoreMLTensor which preserves rank, so MIL gather can consume
them directly.

Therefore: allow scalar 'indices' on the MLProgram path only when it is a
constant initializer. NN path and non-initializer scalar 'indices' remain
rejected. AddToModelBuilderImpl is unchanged.

Existing CPU OpTester cases cover the shape and now stay on CoreML EP:
- GatherOpTest.Gather_axis0_scalar_indices
- GatherOpTest.Gather_axis1_scalar_indices
@john-rocky john-rocky force-pushed the fix/28180-gather-scalar-indices branch from b5b7db9 to 3a4af38 Compare May 15, 2026 04:27
@john-rocky
Copy link
Copy Markdown
Author

Apologies — I missed @maxwbuckley's earlier #28278 when looking at this issue. Their PR covers both the NN and MLProgram paths and is already in review with @yuslepukhin, so closing this in favor of theirs.

@john-rocky john-rocky closed this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance][CoreML] GatherOpBuilder rejects rank-0 (scalar) indices, forcing CPU fallback for StyleGAN-family models

3 participants