[CoreML EP] Add Identity, Ceil, Tile builders + drop trivial-only partitions by maxwbuckley · Pull Request #28293 · microsoft/onnxruntime

maxwbuckley · 2026-04-30T11:45:11Z

Description

Adds three MLProgram op builders (Identity, Ceil, Tile) to the CoreML EP and a partition-quality heuristic that drops CoreML partitions consisting entirely of trivial shape / cheap-elementwise ops.

No tracking issue; discovered via YOLOv10 partitioning analysis on Apple Silicon.

Empirical impact

YOLOv10n, M3 Max, MLProgram, batch 1, 1500-iter pooled:

	Partitions	Mean	StdDev	P99
Without this patch	4	3.798 ms	0.867	6.608
With this patch	3	3.403 ms	0.636	5.957

10.4% mean speedup, 26% stddev tightening.

Why both pieces are coupled

Adding the builders alone is net-negative on graphs where these ops sit in isolated chains. Per-op CoreML dispatch overhead on M3 Max (32-op chains on a 1×64×56×56 fp32 tensor, n=2997, MLProgram):

op	CPU EP per op	CoreML EP per op
Identity	<1 µs	~14 µs
Ceil	~6 µs	~12 µs
Tile	~10 µs	~10 µs

A trivial-only partition pays ~50-100 µs round-trip marshalling plus ~10 µs per op of CoreML dispatch, vs <1 µs each on CPU. Worth claiming only when sandwiched between compute-heavy ops, where the round-trip is already paid for. The heuristic enforces that.

Implementation

New op builders. Identity emits MIL identity (NN path uses LINEAR(α=1, β=0)). Ceil joins the existing unary chain in UnaryOpBuilder. Tile emits MIL tile; it overrides HasSupportedInputsImpl to additionally accept INT32/INT64/BOOL (Tile is shape-only data movement, so the default float-only filter rejected it on common YOLO grid-index post-processing) and accepts a runtime reps tensor in addition to a constant initializer.

Heuristic. CoreMLExecutionProvider::GetCapability now uses the callback-taking overload of CreateSupportedPartitions (same as NNAPI EP). A partition is kept iff at least one node is outside the trivial set:

{Identity, Cast, Reshape, Squeeze, Unsqueeze, Flatten, Transpose, Tile, Ceil}

This lets the new builders absorb mid-chain trivial ops into existing CoreML partitions (the win) without claiming isolated trivial chains that would force a needless CPU→CoreML→CPU detour (the regression).

Tests

coreml_basic_test.cc covers both halves of the heuristic.

Builder coverage (compute anchor present → claimed):

IdentityWithConvAnchor, CeilWithConvAnchor, TileWithConvAnchor

Heuristic coverage:

ConvTrivialChainConvKeepsAllOnCoreML — Conv → Identity → Cast → Reshape → Conv stays in a single CoreML partition
TrivialOnlyChainIsNotClaimedByCoreML, ReshapeOnlyChainIsNotClaimedByCoreML, TransposeOnlyChainIsNotClaimedByCoreML, TileOnlyIsNotClaimedByCoreML, CeilOnlyIsNotClaimedByCoreML, MixedTrivialChainIsNotClaimedByCoreML — each falls back to CPU

Trivial-only tests pin graph_optimization_level = Default so passes like IdentityElimination / CastElimination cannot pre-empt the heuristic - what's exercised is GetCapability itself.

All 28 CoreML EP tests pass locally on macOS 26.3 / M3 Max.

🤖 Generated with Claude Code

…titions 10.4% mean speedup and 26% stddev tightening on YOLOv10n (M3 Max, MLProgram, batch 1, 1500 iterations pooled), with no regression on ResNet50 (which contains no Identity/Ceil/Tile). | | Partitions | Mean | StdDev | P99 | |----------------------------|-----------:|--------:|-------:|-------:| | Without builders | 4 | 3.798ms | 0.867 | 6.608 | | With builders + heuristic | 3 | 3.403ms | 0.636 | 5.957 | No tracking issue; discovered via YOLOv10 partitioning analysis. == What changes == Three new MLProgram op builders: Identity, Ceil, Tile. Tile additionally accepts INT32/INT64/BOOL inputs (it is shape-only data movement; the default float-only filter rejected it on common YOLO grid-index post-processing) and accepts a runtime 'reps' tensor, not only a constant initializer. A partition-quality heuristic in CoreMLExecutionProvider::GetCapability that drops partitions whose nodes are all in {Identity, Cast, Reshape, Squeeze, Unsqueeze, Flatten, Transpose, Tile, Ceil}. The heuristic uses the callback-taking overload of CreateSupportedPartitions (same as NNAPI EP); a partition is kept iff at least one node is outside the trivial set. == Why both == Op coverage and the heuristic are coupled: adding the builders alone is net-negative on graphs where these ops sit in isolated chains. Per-op CoreML dispatch overhead on M3 Max (32-op chains on a 1x64x56x56 fp32 tensor, n=2997, MLProgram): | op | CPU EP per op | CoreML EP per op | |----------|--------------:|-----------------:| | Identity | <1 us | ~14 us | | Ceil | ~6 us | ~12 us | | Tile | ~10 us | ~10 us | A trivial-only partition pays ~50-100us round-trip marshalling plus ~10us per op of CoreML dispatch, vs <1us each on CPU. Worth claiming only when sandwiched between compute-heavy ops where the round-trip is already paid for. The heuristic enforces that. == Tests == coreml_basic_test.cc covers both halves of the heuristic: builder coverage with Conv anchors (claimed), and heuristic coverage with one sandwich case (claimed) plus six trivial-only chains across different op types (dropped). Trivial-only tests pin graph_optimization_level= Default so passes like IdentityElimination cannot pre-empt the heuristic - what's exercised is GetCapability itself. All pass locally on macOS 26.3 / M3 Max. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves conflict in coreml_basic_test.cc where this branch's tile/ceil/identity helpers + tests landed in the same file region as the Split11/13/7 tests merged via microsoft#28270. Both sets are preserved sequentially. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

clang-format flagged a double blank line between the QuickGeluTestFp16 test and the tile/ceil/identity helper namespace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…IsTrivial Replaces the hand-maintained kTrivialOpTypes set in CoreMLExecutionProvider::GetCapability with a virtual IOpBuilder::IsTrivial(node) method overridden by each trivial builder (Identity, Cast, Reshape, Squeeze, Flatten, Transpose, Tile, plus Ceil inside UnaryOpBuilder). The marker now lives next to each builder's other classification methods, so adding a new trivial builder won't drift from the partition heuristic, and a UnaryOpBuilder-style multi-op-type builder can answer per node. The trivial-partition heuristic itself is unchanged; the eight existing *Chain*/*Anchor* tests pass without modification. The Cast override carries a one-line note explaining why it counts as trivial (marshalling-cost-dominated for small tensors) since it's the least obviously cheap op in the set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tity # Conflicts: # onnxruntime/test/providers/coreml/coreml_basic_test.cc

maxwbuckley · 2026-05-12T04:12:49Z

@yuslepukhin last one 🥳

yuslepukhin

Some minor comments.

Coverage gaps:

No test for Tile with non-unit repeats (e.g., reps=[1,1,2,2]) — would be good to verify actual tiling works on macOS.
No test for Identity/Tile on non-float types (int32, bool) — though the builders' type restrictions make this less relevant.
No standalone test for the NeuralNetwork path of Identity or Tile builders (only MLProgram is tested). The - NeuralNetwork path exists in both builders but RunConvChainTest uses MakeCoreMLExecutionProvider("MLProgram").
The tile_with_repeats parameter in MakeConvWithTrivialChainModel is accepted but unused ((void)tile_with_repeats;). This is dead code — it was likely meant to support a variant test but was never wired up.

yuslepukhin · 2026-05-14T21:08:54Z

+    node->add_input("conv_out");
+    node->add_input("reps");
+    node->add_output("Y");
+    (void)tile_with_repeats;


No C casts. Use ORT_UNUSED_* macros

maxwbuckley and others added 5 commits April 30, 2026 13:43

Drop extra blank line introduced by merge splice

0c58883

clang-format flagged a double blank line between the QuickGeluTestFp16 test and the tile/ceil/identity helper namespace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into coreml-tile-ceil-iden…

8a7a2bf

…tity # Conflicts: # onnxruntime/test/providers/coreml/coreml_basic_test.cc

yuslepukhin requested a review from Copilot May 13, 2026 21:10

Copilot started reviewing on behalf of yuslepukhin May 13, 2026 21:40 View session

yuslepukhin reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CoreML EP] Add Identity, Ceil, Tile builders + drop trivial-only partitions#28293

[CoreML EP] Add Identity, Ceil, Tile builders + drop trivial-only partitions#28293
maxwbuckley wants to merge 5 commits into
microsoft:mainfrom
maxwbuckley:coreml-tile-ceil-identity

maxwbuckley commented Apr 30, 2026 •

edited

Loading

Uh oh!

maxwbuckley commented May 12, 2026

Uh oh!

yuslepukhin left a comment •

edited

Loading

Uh oh!

yuslepukhin May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maxwbuckley commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Empirical impact

Why both pieces are coupled

Implementation

Tests

Uh oh!

maxwbuckley commented May 12, 2026

Uh oh!

yuslepukhin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maxwbuckley commented Apr 30, 2026 •

edited

Loading

yuslepukhin left a comment •

edited

Loading