[CoreML EP] Add Sin and Cos unary ops by maxwbuckley · Pull Request #28596 · microsoft/onnxruntime

maxwbuckley · 2026-05-20T17:56:16Z

Summary

Lower ONNX Sin and Cos to the CoreML ML Program sin / cos elementwise ops
via the existing UnaryOpBuilder, registered in the op builder factory. Like
Erf / Round / Exp, these have no NeuralNetwork lowering
(UnaryFunctionLayerParams has no sin/cos), so IsOpSupportedImpl rejects them on
the NeuralNetwork format.

Why

Sin / Cos form the sinusoidal timestep embedding of diffusion UNets. Supporting
them keeps that prologue on CoreML instead of splitting the graph — a tiny
Stable-Diffusion UNet goes from 2 CoreML partitions → 1, zero graph breaks with
this change alone.

This PR is independent of the rest of the series (it touches only the unary
builder) and can be reviewed/merged in any order.

Tests (`coreml_basic_test.cc`)

SinCos_MLProgram — a Sin + Cos graph runs fully on CoreML and matches the CPU
reference.
SinCosNeuralNetworkNotSupported — the same graph falls back to CPU on the
NeuralNetwork format.

Doc: coreml_supported_mlprogram_ops.md lists Sin and Cos.

Series — CoreML EP coverage for transformer / diffusion graphs

[CoreML EP] Support bool Cast in ML Program #28595 — Support bool Cast in ML Program (prerequisite)
[CoreML EP] Add Sin and Cos unary ops #28596 — Add Sin and Cos unary ops (this PR — independent)
[CoreML EP] Add Where and And builders #28597 — Add Where and And builders (depends on [CoreML EP] Support bool Cast in ML Program #28595)
[CoreML EP] Add GatherND builder #28598 — Add GatherND builder (depends on [CoreML EP] Support bool Cast in ML Program #28595)

Together with #28278 (scalar-Gather), the series takes BERT / GPT-2 / ViT /
diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with
zero graph breaks.

Lower ONNX Sin and Cos to the CoreML ML Program `sin` / `cos` elementwise ops via the existing UnaryOpBuilder, and register them in the op builder factory. Like Erf/Round/Exp, these have no NeuralNetwork lowering (UnaryFunctionLayerParams has no sin/cos), so IsOpSupportedImpl rejects them on the NeuralNetwork format. These appear in the timestep (sinusoidal position) embedding of diffusion UNets; supporting them lets that prologue stay on CoreML instead of splitting the graph into separate partitions. Tests (coreml_basic_test.cc): - SinCos_MLProgram: a Sin+Cos graph runs fully on CoreML and matches the CPU reference. - SinCosNeuralNetworkNotSupported: the same graph falls back to CPU on the NeuralNetwork format. Doc: coreml_supported_mlprogram_ops.md lists Sin and Cos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # onnxruntime/core/providers/coreml/builders/impl/unary_op_builder.cc # onnxruntime/core/providers/coreml/builders/op_builder_factory.cc

maxwbuckley · 2026-05-22T06:30:09Z

@yuslepukhin Continuing the great work on making Mac ML on Onnxruntime amazing! Thank you :)

Copilot

Pull request overview

Adds CoreML Execution Provider support for ONNX Sin and Cos by lowering them to CoreML ML Program elementwise sin/cos operations via the existing UnaryOpBuilder, with explicit fallback on the NeuralNetwork format where no equivalent unary layer exists.

Changes:

Register Sin and Cos in the CoreML op builder factory as unary ops.
Extend UnaryOpBuilder ML Program lowering to emit sin and cos, and explicitly reject Sin/Cos for the NeuralNetwork format.
Add CoreML EP tests validating ML Program execution and NeuralNetwork fallback, and document the newly supported ops.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
tools/ci_build/github/apple/coreml_supported_mlprogram_ops.md	Documents ML Program support for `Sin` and `Cos`.
onnxruntime/test/providers/coreml/coreml_basic_test.cc	Adds ML Program correctness test for `Sin`/`Cos` and a NeuralNetwork-format fallback test.
onnxruntime/core/providers/coreml/builders/op_builder_factory.cc	Registers `Sin` and `Cos` with the unary op builder.
onnxruntime/core/providers/coreml/builders/impl/unary_op_builder.cc	Implements ML Program lowering for `Sin`/`Cos` and gates them off for NeuralNetwork format support checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yuslepukhin

LGTM

Resolves conflict in coreml_basic_test.cc by keeping both the new Sin/Cos tests (SinCos_MLProgram and SinCosNeuralNetworkNotSupported) and the upstream Gather test additions.

yuslepukhin

### Summary Two changes to the ML Program `Cast` builder: 1. **Accept `BOOL` as a source and target dtype** in `HasSupportedInputsImpl`. The ML Program `cast` op already handles bool, and `AddToModelBuilderImpl` already maps `to == BOOL`; only the input/output type gate omitted it. 2. **Move the "no preceding node" check after the ML Program early-return.** That check is legacy gating for the NeuralNetwork ArgMax-only path (which dereferences `InputEdgesBegin()`); on the ML Program path a `Cast` fed directly by a graph input is fine, and rejecting it forced needless CPU fallback. ### Why This is the first of a **4-PR series** giving the CoreML EP the op coverage to run transformer and diffusion graphs as a *single CoreML partition* instead of fragmenting across CPU. Transformer attention-mask graphs are a `Cast → GatherND → And → Where` chain over **bool** tensors. A CoreML partition cannot have a bool input/output (CoreML `MLMultiArray` has no bool type), so bool must stay *internal* — which makes `Cast` (the int↔bool boundary) the prerequisite for the rest of the series. ### Combined impact of the series With all four PRs plus #28278 (scalar-`Gather`), every model below goes from 2 CoreML partitions to **1, with zero graph breaks** — the whole graph runs on CoreML. Measured on an Apple M3 Max, ML Program format: | Model | partitions (before → after) | CoreML vs CPU | |-------|:---------------------------:|--------------:| | BERT-large (340M) | 2 → 1 | 7.3× (fp32) / 11.0× (fp16) | | ViT-large (304M) | 2 → 1 | 8.5× (fp32) / 10.3× (fp16) | | GPT-2-large (774M) | 2 → 1 | 11.4× (fp16) | | SD-1.5 UNet (860M) | 2 → 1 | 9.7× (fp16) | The op builders eliminate the graph breaks (deterministic); the speedups are what CoreML already delivers once a model is no longer fragmented. ### Tests (`coreml_basic_test.cc`) - `CastNonArgMaxNeuralNetworkNotSupported` — an `int64 → bool → float` cast chain falls back to CPU on the NeuralNetwork format, guarding the `IsOpSupportedImpl` reordering. Positive `bool`-Cast coverage is in the dependent PRs: `Cast → GatherND → Cast` (#28598's `GatherNDBoolData_MLProgram`) and `Cast → And → Cast` (#28597's `And_MLProgram`). Both place a non-`Cast` op between the int↔bool casts and check the result against the CPU EP. A *standalone* `int64 → Cast(bool) → Cast(float)` round-trip can't be verified here — CoreML's compiler fuses back-to-back `cast` ops and drops the bool clamp — so the pattern needs that intervening op, which only the dependent PRs provide. ### Series — CoreML EP coverage for transformer / diffusion graphs - **#28595 — Support bool Cast in ML Program** *(this PR — prerequisite)* - #28596 — Add Sin and Cos unary ops *(independent)* - #28597 — Add Where and And builders *(depends on #28595)* - #28598 — Add GatherND builder *(depends on #28595)* Together with #28278 (scalar-`Gather`), the series takes BERT / GPT-2 / ViT / diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with zero graph breaks. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

### Summary New ML Program op builder: ONNX `GatherND` → CoreML `gather_nd`. - `batch_dims` must be 0 — the iOS15 `gather_nd` op has no `batch_dims` parameter; `IsOpSupportedImpl` rejects other values. - CoreML's `gather_nd` rejects a **bool `x`**, but transformer attention-mask graphs gather from bool tensors. For bool data the builder lowers the op as `cast(bool→int32) → gather_nd → cast(int32→bool)`; int32 represents 0/1 exactly, so the round-trip is lossless. - `validate_indices` is passed explicitly — the ML Program parser rejects `gather_nd` without it (the same quirk the `gather` builder works around). - ML-Program-only; `IsOpSupportedImpl` rejects the NeuralNetwork format. ### Indices handling (CoreML `gather_nd` quirks) Two CoreML behaviours that differ from ONNX are handled in the builder: - **`indices` must be a constant initializer.** CoreML's `gather_nd` miscomputes the result for some data/indices shape combinations when `indices` is a runtime (non-constant) input — it returns slice 0 regardless of the actual index value. With a constant `indices` it is correct, so non-constant cases fall back to CPU. Constant indices is also the common case (e.g. transformer attention masks). - **Negative indices are normalized at build time.** ONNX `GatherND` wraps a negative index by the corresponding data dim; CoreML's `gather_nd` does not and silently returns wrong values. Since `indices` is constant, the builder wraps any negatives into positive int32 indices while building the model (and requires the indexed data dims to be static, otherwise the node falls back to CPU). This was surfaced by fuzzing over randomized shapes/indices and verified on-device (negative indices, scalar outputs, ranks 2–4) against the CPU reference. ### Depends on the bool-Cast PR The bool-data `GatherND` test needs `Cast` as the `int ↔ bool` producer/consumer so the bool tensors stay internal to the CoreML partition (a partition cannot have bool I/O). This branch is **stacked on `coreml-cast-bool`** — the `cb43b7c75f` commit in this PR is the bool-Cast PR and drops from this diff once that one merges. ### Tests (`coreml_basic_test.cc`) - `GatherND_MLProgram` — a float `GatherND` runs on CoreML, matches CPU. - `GatherNDBoolData_MLProgram` — a `Cast → GatherND → Cast` bool chain runs fully on CoreML, exercising the cast round-trip lowering. - `GatherNDNeuralNetworkNotSupported` — `GatherND` falls back on the NeuralNetwork format. - `GatherNDBatchDimsNotSupported` — `GatherND` with `batch_dims=1` falls back to CPU. Doc: `coreml_supported_mlprogram_ops.md` lists `GatherND`. ### Series — CoreML EP coverage for transformer / diffusion graphs - #28595 — Support bool Cast in ML Program *(prerequisite)* - #28596 — Add Sin and Cos unary ops *(independent)* - #28597 — Add Where and And builders *(depends on #28595)* - **#28598 — Add GatherND builder** *(this PR — depends on #28595)* Together with #28278 (scalar-`Gather`), the series takes BERT / GPT-2 / ViT / diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with zero graph breaks. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced May 20, 2026

[CoreML EP] Support bool Cast in ML Program #28595

Merged

[CoreML EP] Add Where and And builders #28597

Open

[CoreML EP] Add GatherND builder #28598

Merged

Merge remote-tracking branch 'origin/main' into coreml-sin-cos

46ec8e0

# Conflicts: # onnxruntime/core/providers/coreml/builders/impl/unary_op_builder.cc # onnxruntime/core/providers/coreml/builders/op_builder_factory.cc

maxwbuckley marked this pull request as ready for review May 22, 2026 06:24

yuslepukhin requested a review from Copilot May 25, 2026 20:42

Copilot started reviewing on behalf of yuslepukhin May 25, 2026 20:43 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

yuslepukhin previously approved these changes May 26, 2026

View reviewed changes

Merge branch 'main' into coreml-sin-cos

c8554e4

Resolves conflict in coreml_basic_test.cc by keeping both the new Sin/Cos tests (SinCos_MLProgram and SinCosNeuralNetworkNotSupported) and the upstream Gather test additions.

maxwbuckley dismissed yuslepukhin’s stale review via c8554e4 May 26, 2026 21:19

yuslepukhin approved these changes May 26, 2026

View reviewed changes

yuslepukhin enabled auto-merge (squash) May 27, 2026 17:29

yuslepukhin merged commit 71cfbb0 into microsoft:main May 27, 2026
92 of 93 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CoreML EP] Add Sin and Cos unary ops#28596

[CoreML EP] Add Sin and Cos unary ops#28596
yuslepukhin merged 3 commits into
microsoft:mainfrom
maxwbuckley:coreml-sin-cos

maxwbuckley commented May 20, 2026 •

edited

Loading

Uh oh!

maxwbuckley commented May 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

yuslepukhin left a comment

Uh oh!

yuslepukhin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maxwbuckley commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Tests (coreml_basic_test.cc)

Series — CoreML EP coverage for transformer / diffusion graphs

Uh oh!

maxwbuckley commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxwbuckley commented May 20, 2026 •

edited

Loading

Tests (`coreml_basic_test.cc`)

maxwbuckley commented May 22, 2026 •

edited

Loading