[CoreML EP] Add Sin and Cos unary ops#28596
Merged
Merged
Conversation
Lower ONNX Sin and Cos to the CoreML ML Program `sin` / `cos` elementwise ops via the existing UnaryOpBuilder, and register them in the op builder factory. Like Erf/Round/Exp, these have no NeuralNetwork lowering (UnaryFunctionLayerParams has no sin/cos), so IsOpSupportedImpl rejects them on the NeuralNetwork format. These appear in the timestep (sinusoidal position) embedding of diffusion UNets; supporting them lets that prologue stay on CoreML instead of splitting the graph into separate partitions. Tests (coreml_basic_test.cc): - SinCos_MLProgram: a Sin+Cos graph runs fully on CoreML and matches the CPU reference. - SinCosNeuralNetworkNotSupported: the same graph falls back to CPU on the NeuralNetwork format. Doc: coreml_supported_mlprogram_ops.md lists Sin and Cos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 20, 2026
# Conflicts: # onnxruntime/core/providers/coreml/builders/impl/unary_op_builder.cc # onnxruntime/core/providers/coreml/builders/op_builder_factory.cc
Contributor
Author
|
@yuslepukhin Continuing the great work on making Mac ML on Onnxruntime amazing! Thank you :) |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds CoreML Execution Provider support for ONNX Sin and Cos by lowering them to CoreML ML Program elementwise sin/cos operations via the existing UnaryOpBuilder, with explicit fallback on the NeuralNetwork format where no equivalent unary layer exists.
Changes:
- Register
SinandCosin the CoreML op builder factory as unary ops. - Extend
UnaryOpBuilderML Program lowering to emitsinandcos, and explicitly rejectSin/Cosfor the NeuralNetwork format. - Add CoreML EP tests validating ML Program execution and NeuralNetwork fallback, and document the newly supported ops.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md | Documents ML Program support for Sin and Cos. |
| onnxruntime/test/providers/coreml/coreml_basic_test.cc | Adds ML Program correctness test for Sin/Cos and a NeuralNetwork-format fallback test. |
| onnxruntime/core/providers/coreml/builders/op_builder_factory.cc | Registers Sin and Cos with the unary op builder. |
| onnxruntime/core/providers/coreml/builders/impl/unary_op_builder.cc | Implements ML Program lowering for Sin/Cos and gates them off for NeuralNetwork format support checks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Resolves conflict in coreml_basic_test.cc by keeping both the new Sin/Cos tests (SinCos_MLProgram and SinCosNeuralNetworkNotSupported) and the upstream Gather test additions.
yuslepukhin
pushed a commit
that referenced
this pull request
Jun 2, 2026
### Summary Two changes to the ML Program `Cast` builder: 1. **Accept `BOOL` as a source and target dtype** in `HasSupportedInputsImpl`. The ML Program `cast` op already handles bool, and `AddToModelBuilderImpl` already maps `to == BOOL`; only the input/output type gate omitted it. 2. **Move the "no preceding node" check after the ML Program early-return.** That check is legacy gating for the NeuralNetwork ArgMax-only path (which dereferences `InputEdgesBegin()`); on the ML Program path a `Cast` fed directly by a graph input is fine, and rejecting it forced needless CPU fallback. ### Why This is the first of a **4-PR series** giving the CoreML EP the op coverage to run transformer and diffusion graphs as a *single CoreML partition* instead of fragmenting across CPU. Transformer attention-mask graphs are a `Cast → GatherND → And → Where` chain over **bool** tensors. A CoreML partition cannot have a bool input/output (CoreML `MLMultiArray` has no bool type), so bool must stay *internal* — which makes `Cast` (the int↔bool boundary) the prerequisite for the rest of the series. ### Combined impact of the series With all four PRs plus #28278 (scalar-`Gather`), every model below goes from 2 CoreML partitions to **1, with zero graph breaks** — the whole graph runs on CoreML. Measured on an Apple M3 Max, ML Program format: | Model | partitions (before → after) | CoreML vs CPU | |-------|:---------------------------:|--------------:| | BERT-large (340M) | 2 → 1 | 7.3× (fp32) / 11.0× (fp16) | | ViT-large (304M) | 2 → 1 | 8.5× (fp32) / 10.3× (fp16) | | GPT-2-large (774M) | 2 → 1 | 11.4× (fp16) | | SD-1.5 UNet (860M) | 2 → 1 | 9.7× (fp16) | The op builders eliminate the graph breaks (deterministic); the speedups are what CoreML already delivers once a model is no longer fragmented. ### Tests (`coreml_basic_test.cc`) - `CastNonArgMaxNeuralNetworkNotSupported` — an `int64 → bool → float` cast chain falls back to CPU on the NeuralNetwork format, guarding the `IsOpSupportedImpl` reordering. Positive `bool`-Cast coverage is in the dependent PRs: `Cast → GatherND → Cast` (#28598's `GatherNDBoolData_MLProgram`) and `Cast → And → Cast` (#28597's `And_MLProgram`). Both place a non-`Cast` op between the int↔bool casts and check the result against the CPU EP. A *standalone* `int64 → Cast(bool) → Cast(float)` round-trip can't be verified here — CoreML's compiler fuses back-to-back `cast` ops and drops the bool clamp — so the pattern needs that intervening op, which only the dependent PRs provide. ### Series — CoreML EP coverage for transformer / diffusion graphs - **#28595 — Support bool Cast in ML Program** *(this PR — prerequisite)* - #28596 — Add Sin and Cos unary ops *(independent)* - #28597 — Add Where and And builders *(depends on #28595)* - #28598 — Add GatherND builder *(depends on #28595)* Together with #28278 (scalar-`Gather`), the series takes BERT / GPT-2 / ViT / diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with zero graph breaks. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yuslepukhin
pushed a commit
that referenced
this pull request
Jun 2, 2026
### Summary New ML Program op builder: ONNX `GatherND` → CoreML `gather_nd`. - `batch_dims` must be 0 — the iOS15 `gather_nd` op has no `batch_dims` parameter; `IsOpSupportedImpl` rejects other values. - CoreML's `gather_nd` rejects a **bool `x`**, but transformer attention-mask graphs gather from bool tensors. For bool data the builder lowers the op as `cast(bool→int32) → gather_nd → cast(int32→bool)`; int32 represents 0/1 exactly, so the round-trip is lossless. - `validate_indices` is passed explicitly — the ML Program parser rejects `gather_nd` without it (the same quirk the `gather` builder works around). - ML-Program-only; `IsOpSupportedImpl` rejects the NeuralNetwork format. ### Indices handling (CoreML `gather_nd` quirks) Two CoreML behaviours that differ from ONNX are handled in the builder: - **`indices` must be a constant initializer.** CoreML's `gather_nd` miscomputes the result for some data/indices shape combinations when `indices` is a runtime (non-constant) input — it returns slice 0 regardless of the actual index value. With a constant `indices` it is correct, so non-constant cases fall back to CPU. Constant indices is also the common case (e.g. transformer attention masks). - **Negative indices are normalized at build time.** ONNX `GatherND` wraps a negative index by the corresponding data dim; CoreML's `gather_nd` does not and silently returns wrong values. Since `indices` is constant, the builder wraps any negatives into positive int32 indices while building the model (and requires the indexed data dims to be static, otherwise the node falls back to CPU). This was surfaced by fuzzing over randomized shapes/indices and verified on-device (negative indices, scalar outputs, ranks 2–4) against the CPU reference. ### Depends on the bool-Cast PR The bool-data `GatherND` test needs `Cast` as the `int ↔ bool` producer/consumer so the bool tensors stay internal to the CoreML partition (a partition cannot have bool I/O). This branch is **stacked on `coreml-cast-bool`** — the `cb43b7c75f` commit in this PR is the bool-Cast PR and drops from this diff once that one merges. ### Tests (`coreml_basic_test.cc`) - `GatherND_MLProgram` — a float `GatherND` runs on CoreML, matches CPU. - `GatherNDBoolData_MLProgram` — a `Cast → GatherND → Cast` bool chain runs fully on CoreML, exercising the cast round-trip lowering. - `GatherNDNeuralNetworkNotSupported` — `GatherND` falls back on the NeuralNetwork format. - `GatherNDBatchDimsNotSupported` — `GatherND` with `batch_dims=1` falls back to CPU. Doc: `coreml_supported_mlprogram_ops.md` lists `GatherND`. ### Series — CoreML EP coverage for transformer / diffusion graphs - #28595 — Support bool Cast in ML Program *(prerequisite)* - #28596 — Add Sin and Cos unary ops *(independent)* - #28597 — Add Where and And builders *(depends on #28595)* - **#28598 — Add GatherND builder** *(this PR — depends on #28595)* Together with #28278 (scalar-`Gather`), the series takes BERT / GPT-2 / ViT / diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with zero graph breaks. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lower ONNX
SinandCosto the CoreML ML Programsin/coselementwise opsvia the existing
UnaryOpBuilder, registered in the op builder factory. LikeErf/Round/Exp, these have no NeuralNetwork lowering(
UnaryFunctionLayerParamshas no sin/cos), soIsOpSupportedImplrejects them onthe NeuralNetwork format.
Why
Sin/Cosform the sinusoidal timestep embedding of diffusion UNets. Supportingthem keeps that prologue on CoreML instead of splitting the graph — a tiny
Stable-Diffusion UNet goes from 2 CoreML partitions → 1, zero graph breaks with
this change alone.
This PR is independent of the rest of the series (it touches only the unary
builder) and can be reviewed/merged in any order.
Tests (
coreml_basic_test.cc)SinCos_MLProgram— a Sin + Cos graph runs fully on CoreML and matches the CPUreference.
SinCosNeuralNetworkNotSupported— the same graph falls back to CPU on theNeuralNetwork format.
Doc:
coreml_supported_mlprogram_ops.mdlistsSinandCos.Series — CoreML EP coverage for transformer / diffusion graphs
Together with #28278 (scalar-
Gather), the series takes BERT / GPT-2 / ViT /diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with
zero graph breaks.