[CoreML EP] Add Identity, Ceil, Tile builders + drop trivial-only partitions#28293
Open
maxwbuckley wants to merge 5 commits into
Open
[CoreML EP] Add Identity, Ceil, Tile builders + drop trivial-only partitions#28293maxwbuckley wants to merge 5 commits into
maxwbuckley wants to merge 5 commits into
Conversation
…titions
10.4% mean speedup and 26% stddev tightening on YOLOv10n (M3 Max,
MLProgram, batch 1, 1500 iterations pooled), with no regression on
ResNet50 (which contains no Identity/Ceil/Tile).
| | Partitions | Mean | StdDev | P99 |
|----------------------------|-----------:|--------:|-------:|-------:|
| Without builders | 4 | 3.798ms | 0.867 | 6.608 |
| With builders + heuristic | 3 | 3.403ms | 0.636 | 5.957 |
No tracking issue; discovered via YOLOv10 partitioning analysis.
== What changes ==
Three new MLProgram op builders: Identity, Ceil, Tile. Tile additionally
accepts INT32/INT64/BOOL inputs (it is shape-only data movement; the
default float-only filter rejected it on common YOLO grid-index
post-processing) and accepts a runtime 'reps' tensor, not only a
constant initializer.
A partition-quality heuristic in CoreMLExecutionProvider::GetCapability
that drops partitions whose nodes are all in {Identity, Cast, Reshape,
Squeeze, Unsqueeze, Flatten, Transpose, Tile, Ceil}. The heuristic uses
the callback-taking overload of CreateSupportedPartitions (same as
NNAPI EP); a partition is kept iff at least one node is outside the
trivial set.
== Why both ==
Op coverage and the heuristic are coupled: adding the builders alone is
net-negative on graphs where these ops sit in isolated chains.
Per-op CoreML dispatch overhead on M3 Max (32-op chains on a
1x64x56x56 fp32 tensor, n=2997, MLProgram):
| op | CPU EP per op | CoreML EP per op |
|----------|--------------:|-----------------:|
| Identity | <1 us | ~14 us |
| Ceil | ~6 us | ~12 us |
| Tile | ~10 us | ~10 us |
A trivial-only partition pays ~50-100us round-trip marshalling plus
~10us per op of CoreML dispatch, vs <1us each on CPU. Worth claiming
only when sandwiched between compute-heavy ops where the round-trip is
already paid for. The heuristic enforces that.
== Tests ==
coreml_basic_test.cc covers both halves of the heuristic: builder
coverage with Conv anchors (claimed), and heuristic coverage with one
sandwich case (claimed) plus six trivial-only chains across different
op types (dropped). Trivial-only tests pin graph_optimization_level=
Default so passes like IdentityElimination cannot pre-empt the
heuristic - what's exercised is GetCapability itself.
All pass locally on macOS 26.3 / M3 Max.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflict in coreml_basic_test.cc where this branch's tile/ceil/identity helpers + tests landed in the same file region as the Split11/13/7 tests merged via microsoft#28270. Both sets are preserved sequentially. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
clang-format flagged a double blank line between the QuickGeluTestFp16 test and the tile/ceil/identity helper namespace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…IsTrivial Replaces the hand-maintained kTrivialOpTypes set in CoreMLExecutionProvider::GetCapability with a virtual IOpBuilder::IsTrivial(node) method overridden by each trivial builder (Identity, Cast, Reshape, Squeeze, Flatten, Transpose, Tile, plus Ceil inside UnaryOpBuilder). The marker now lives next to each builder's other classification methods, so adding a new trivial builder won't drift from the partition heuristic, and a UnaryOpBuilder-style multi-op-type builder can answer per node. The trivial-partition heuristic itself is unchanged; the eight existing *Chain*/*Anchor* tests pass without modification. The Cast override carries a one-line note explaining why it counts as trivial (marshalling-cost-dominated for small tensors) since it's the least obviously cheap op in the set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tity # Conflicts: # onnxruntime/test/providers/coreml/coreml_basic_test.cc
Contributor
Author
|
@yuslepukhin last one 🥳 |
yuslepukhin
reviewed
May 14, 2026
Member
There was a problem hiding this comment.
Some minor comments.
Coverage gaps:
- No test for Tile with non-unit repeats (e.g., reps=[1,1,2,2]) — would be good to verify actual tiling works on macOS.
- No test for Identity/Tile on non-float types (int32, bool) — though the builders' type restrictions make this less relevant.
- No standalone test for the NeuralNetwork path of Identity or Tile builders (only MLProgram is tested). The - NeuralNetwork path exists in both builders but RunConvChainTest uses MakeCoreMLExecutionProvider("MLProgram").
- The tile_with_repeats parameter in MakeConvWithTrivialChainModel is accepted but unused ((void)tile_with_repeats;). This is dead code — it was likely meant to support a variant test but was never wired up.
| node->add_input("conv_out"); | ||
| node->add_input("reps"); | ||
| node->add_output("Y"); | ||
| (void)tile_with_repeats; |
Member
There was a problem hiding this comment.
No C casts. Use ORT_UNUSED_* macros
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds three MLProgram op builders (Identity, Ceil, Tile) to the CoreML EP and a partition-quality heuristic that drops CoreML partitions consisting entirely of trivial shape / cheap-elementwise ops.
No tracking issue; discovered via YOLOv10 partitioning analysis on Apple Silicon.
Empirical impact
YOLOv10n, M3 Max, MLProgram, batch 1, 1500-iter pooled:
10.4% mean speedup, 26% stddev tightening.
Why both pieces are coupled
Adding the builders alone is net-negative on graphs where these ops sit in isolated chains. Per-op CoreML dispatch overhead on M3 Max (32-op chains on a 1×64×56×56 fp32 tensor, n=2997, MLProgram):
A trivial-only partition pays ~50-100 µs round-trip marshalling plus ~10 µs per op of CoreML dispatch, vs <1 µs each on CPU. Worth claiming only when sandwiched between compute-heavy ops, where the round-trip is already paid for. The heuristic enforces that.
Implementation
New op builders.
Identityemits MILidentity(NN path usesLINEAR(α=1, β=0)).Ceiljoins the existing unary chain inUnaryOpBuilder.Tileemits MILtile; it overridesHasSupportedInputsImplto additionally accept INT32/INT64/BOOL (Tile is shape-only data movement, so the default float-only filter rejected it on common YOLO grid-index post-processing) and accepts a runtimerepstensor in addition to a constant initializer.Heuristic.
CoreMLExecutionProvider::GetCapabilitynow uses the callback-taking overload ofCreateSupportedPartitions(same as NNAPI EP). A partition is kept iff at least one node is outside the trivial set:This lets the new builders absorb mid-chain trivial ops into existing CoreML partitions (the win) without claiming isolated trivial chains that would force a needless CPU→CoreML→CPU detour (the regression).
Tests
coreml_basic_test.cccovers both halves of the heuristic.Builder coverage (compute anchor present → claimed):
IdentityWithConvAnchor,CeilWithConvAnchor,TileWithConvAnchorHeuristic coverage:
ConvTrivialChainConvKeepsAllOnCoreML— Conv → Identity → Cast → Reshape → Conv stays in a single CoreML partitionTrivialOnlyChainIsNotClaimedByCoreML,ReshapeOnlyChainIsNotClaimedByCoreML,TransposeOnlyChainIsNotClaimedByCoreML,TileOnlyIsNotClaimedByCoreML,CeilOnlyIsNotClaimedByCoreML,MixedTrivialChainIsNotClaimedByCoreML— each falls back to CPUTrivial-only tests pin
graph_optimization_level = Defaultso passes likeIdentityElimination/CastEliminationcannot pre-empt the heuristic - what's exercised isGetCapabilityitself.All 28 CoreML EP tests pass locally on macOS 26.3 / M3 Max.
🤖 Generated with Claude Code