[CoreML EP] Support bool Cast in ML Program by maxwbuckley · Pull Request #28595 · microsoft/onnxruntime

maxwbuckley · 2026-05-20T17:56:13Z

Summary

Two changes to the ML Program Cast builder:

Accept BOOL as a source and target dtype in HasSupportedInputsImpl. The
ML Program cast op already handles bool, and AddToModelBuilderImpl already
maps to == BOOL; only the input/output type gate omitted it.
Move the "no preceding node" check after the ML Program early-return. That
check is legacy gating for the NeuralNetwork ArgMax-only path (which
dereferences InputEdgesBegin()); on the ML Program path a Cast fed directly
by a graph input is fine, and rejecting it forced needless CPU fallback.

Why

This is the first of a 4-PR series giving the CoreML EP the op coverage to run
transformer and diffusion graphs as a single CoreML partition instead of
fragmenting across CPU.

Transformer attention-mask graphs are a Cast → GatherND → And → Where chain over
bool tensors. A CoreML partition cannot have a bool input/output (CoreML
MLMultiArray has no bool type), so bool must stay internal — which makes Cast
(the int↔bool boundary) the prerequisite for the rest of the series.

Combined impact of the series

With all four PRs plus #28278 (scalar-Gather), every model below goes from 2
CoreML partitions to 1, with zero graph breaks — the whole graph runs on
CoreML. Measured on an Apple M3 Max, ML Program format:

Model	partitions (before → after)	CoreML vs CPU
BERT-large (340M)	2 → 1	7.3× (fp32) / 11.0× (fp16)
ViT-large (304M)	2 → 1	8.5× (fp32) / 10.3× (fp16)
GPT-2-large (774M)	2 → 1	11.4× (fp16)
SD-1.5 UNet (860M)	2 → 1	9.7× (fp16)

The op builders eliminate the graph breaks (deterministic); the speedups are what
CoreML already delivers once a model is no longer fragmented.

Tests (`coreml_basic_test.cc`)

CastNonArgMaxNeuralNetworkNotSupported — an int64 → bool → float cast chain
falls back to CPU on the NeuralNetwork format, guarding the IsOpSupportedImpl
reordering.

Positive bool-Cast coverage is in the dependent PRs: Cast → GatherND → Cast
(#28598's GatherNDBoolData_MLProgram) and Cast → And → Cast (#28597's
And_MLProgram). Both place a non-Cast op between the int↔bool casts and check
the result against the CPU EP. A standalone int64 → Cast(bool) → Cast(float)
round-trip can't be verified here — CoreML's compiler fuses back-to-back cast
ops and drops the bool clamp — so the pattern needs that intervening op, which
only the dependent PRs provide.

Series — CoreML EP coverage for transformer / diffusion graphs

[CoreML EP] Support bool Cast in ML Program #28595 — Support bool Cast in ML Program (this PR — prerequisite)
[CoreML EP] Add Sin and Cos unary ops #28596 — Add Sin and Cos unary ops (independent)
[CoreML EP] Add Where and And builders #28597 — Add Where and And builders (depends on [CoreML EP] Support bool Cast in ML Program #28595)
[CoreML EP] Add GatherND builder #28598 — Add GatherND builder (depends on [CoreML EP] Support bool Cast in ML Program #28595)

Together with #28278 (scalar-Gather), the series takes BERT / GPT-2 / ViT /
diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with
zero graph breaks.

Two changes to the ML Program Cast builder: 1. Accept BOOL as a source and target dtype in HasSupportedInputsImpl. The ML Program `cast` op already handles bool, and AddToModelBuilderImpl already maps `to == BOOL`; only the input/output type gate omitted it. This lets int64<->bool<->float casts (transformer attention-mask graphs) stay on CoreML. 2. Move the "no preceding node" check after the ML Program early-return. It was legacy gating for the NeuralNetwork ArgMax-only path (which dereferences InputEdgesBegin()); on the ML Program path a Cast fed directly by a graph input is fine, and rejecting it forced needless CPU fallback. Tests (coreml_basic_test.cc): - CastBoolRoundTrip_MLProgram: an int64->bool->float cast chain runs fully on CoreML and matches the CPU reference. The bool tensor is internal (a CoreML partition cannot have bool I/O) and the first Cast is graph-input fed. - CastNonArgMaxNeuralNetworkNotSupported: the same chain falls back to CPU on the NeuralNetwork format, guarding the IsOpSupportedImpl reordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CastBoolRoundTrip_MLProgram exercised int64 -> Cast(bool) -> Cast(float). CoreML's compiler fuses the two back-to-back `cast` ops and drops the bool clamp (cast(cast(x,bool),fp32) collapses to cast(x,fp32)), so the round-trip produces the raw input value instead of 0/1 -- the test can't be numerically verified standalone. The bool-Cast support itself is correct: it is exercised end to end by the dependent PRs, where a non-Cast op sits between the int<->bool casts so no fusion occurs -- Cast->And->Cast (Where/And PR) and Cast->GatherND->Cast (GatherND PR), both numerically verified against the CPU EP. CastNonArgMaxNeuralNetworkNotSupported (the NeuralNetwork-format negative test) is kept; it guards the IsOpSupportedImpl reordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

maxwbuckley · 2026-05-22T06:30:06Z

@yuslepukhin Continuing the great work on making Mac ML on Onnxruntime amazing! Thank you :)

Copilot

Pull request overview

This PR extends the CoreML EP’s ML Program Cast support to enable bool casts and avoid unnecessary CPU fallbacks when a Cast is fed directly by a graph input (no preceding node). This is positioned as a prerequisite step toward keeping transformer/diffusion attention-mask subgraphs fully within a single CoreML partition.

Changes:

Allow BOOL as a supported input/output dtype for ML Program Cast in HasSupportedInputsImpl.
Reorder IsOpSupportedImpl so the “no preceding node” rejection applies only to the NeuralNetwork (ArgMax-only) path, not ML Program.
Add a regression test ensuring non-ArgMax Cast chains fall back on the NeuralNetwork format.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
onnxruntime/core/providers/coreml/builders/impl/cast_op_builder.cc	Enables bool dtype gating for ML Program casts and relaxes the “must have preceding node” constraint for ML Program.
onnxruntime/test/providers/coreml/coreml_basic_test.cc	Adds a NeuralNetwork-format negative test covering the reordered support checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wejoncy · 2026-05-26T03:46:16Z

LGTM, Does this has any constrains on the CoreML version?

maxwbuckley · 2026-05-26T12:20:52Z

Thanks for the review! No additional version constraint beyond what the EP already requires for MLProgram. The MIL cast op (iOS 15 / Core ML 5) already accepts bool as both an input type and a dtype string — see coremltools/.../iOS15/elementwise_unary.py where T = (fp16, fp32, int32, bool) and the dtype docstring lists bool. Since this PR only touches the MLProgram path (gated to Core ML 5+ by model_builder.h), the bool-cast support inherits that same minimum: iOS 15 / macOS 12 / Core ML 5.

wejoncy · 2026-05-26T14:07:00Z

Could you resolve the conflicts?

Resolves conflict in coreml_basic_test.cc by keeping both the new bool-Cast NeuralNetwork-negative test and the upstream Gather test additions.

yuslepukhin asked on PR microsoft#28595 for a positive ML-Program-side test that confirms the partitioner claims the bool Cast nodes, even though we can't numerically verify the round-trip (CoreML fuses back-to-back cast ops and drops the bool clamp, so a value-checking test would silently pass even when the bool dtype is ignored). Adds CastBoolMLProgramPartition, a sibling to CastNonArgMaxNeuralNetworkNotSupported: same MakeCastBoolModelData() graph, but with TestModelLoad + MakeCoreMLExecutionProvider("MLProgram") and ExpectedEPNodeAssignment::All. Together the two tests guard: - HasSupportedInputsImpl now accepts bool (positive), - the "no preceding node" rejection now only applies to NeuralNetwork (negative). Positive numerical coverage continues to live in the dependent microsoft#28597 (Where/And) and microsoft#28598 (GatherND) PRs, where a non-Cast op sits between the int<->bool casts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

### Summary Lower ONNX `Sin` and `Cos` to the CoreML ML Program `sin` / `cos` elementwise ops via the existing `UnaryOpBuilder`, registered in the op builder factory. Like `Erf` / `Round` / `Exp`, these have no NeuralNetwork lowering (`UnaryFunctionLayerParams` has no sin/cos), so `IsOpSupportedImpl` rejects them on the NeuralNetwork format. ### Why `Sin` / `Cos` form the sinusoidal timestep embedding of diffusion UNets. Supporting them keeps that prologue on CoreML instead of splitting the graph — a tiny Stable-Diffusion UNet goes from **2 CoreML partitions → 1, zero graph breaks** with this change alone. This PR is **independent** of the rest of the series (it touches only the unary builder) and can be reviewed/merged in any order. ### Tests (`coreml_basic_test.cc`) - `SinCos_MLProgram` — a Sin + Cos graph runs fully on CoreML and matches the CPU reference. - `SinCosNeuralNetworkNotSupported` — the same graph falls back to CPU on the NeuralNetwork format. Doc: `coreml_supported_mlprogram_ops.md` lists `Sin` and `Cos`. ### Series — CoreML EP coverage for transformer / diffusion graphs - #28595 — Support bool Cast in ML Program *(prerequisite)* - **#28596 — Add Sin and Cos unary ops** *(this PR — independent)* - #28597 — Add Where and And builders *(depends on #28595)* - #28598 — Add GatherND builder *(depends on #28595)* Together with #28278 (scalar-`Gather`), the series takes BERT / GPT-2 / ViT / diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with zero graph breaks. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yuslepukhin · 2026-05-27T18:52:35Z

There are conflicts and MacOS test failures: CoreMLExecutionProviderTest.CastBoolMLProgramPartition

[ RUN      ] CoreMLExecutionProviderTest.CastBoolMLProgramPartition
3: /Users/runner/work/onnxruntime/onnxruntime/onnxruntime/test/util/test_utils.cc:108: Failure
3: Expected equality of these values:
3:   provider_node_count
3:     Which is: 0
3:   graph.NumberOfNodes()
3:     Which is: 2
3: Not all nodes were assigned to CoreMLExecutionProvider
3: 
3: [  FAILED  ] CoreMLExecutionProviderTest.CastBoolMLProgramPartition

Convert this test into a partition-only inspection test that verifies the ML Program path claims the Cast nodes, without requiring execution semantics from a back-to-back cast chain.

A good pattern is to inspect the assigned EP types in the resolved graph, similar to other graph-verifier-based tests in the same file.

TEST(CoreMLExecutionProviderTest, CastBoolMLProgramPartition) {
  const std::string model_data = MakeCastBoolModelData();
  gsl::span<const std::byte> model_span{
      reinterpret_cast<const std::byte*>(model_data.data()), model_data.size()};

  const std::function<void(const Graph&)> graph_verifier = [](const Graph& graph) {
    GraphViewer graph_viewer{graph};
    const auto& node_indices_in_order = graph_viewer.GetNodesInTopologicalOrder();
    ASSERT_EQ(node_indices_in_order.size(), size_t{2});

    const auto* cast_to_bool = graph.GetNode(node_indices_in_order[0]);
    ASSERT_NE(cast_to_bool, nullptr);
    ASSERT_EQ(cast_to_bool->OpType(), "Cast");
    ASSERT_EQ(cast_to_bool->GetExecutionProviderType(), kCoreMLExecutionProvider);

    const auto* cast_to_float = graph.GetNode(node_indices_in_order[1]);
    ASSERT_NE(cast_to_float, nullptr);
    ASSERT_EQ(cast_to_float->OpType(), "Cast");
    ASSERT_EQ(cast_to_float->GetExecutionProviderType(), kCoreMLExecutionProvider);
  };

  EPVerificationParams verification_params{};
  verification_params.ep_node_assignment = ExpectedEPNodeAssignment::All;
  verification_params.graph_verifier = &graph_verifier;

  TestModelLoad(model_span,
                MakeCoreMLExecutionProvider("MLProgram"),
                ExpectedEPNodeAssignment::All,
                &verification_params);
}

…rvives The test built an int64->Cast(bool)->Cast(float) graph and asserted ExpectedEPNodeAssignment::All on the ML Program path, but Cast is marked IsTrivial and GetCapability drops any partition made up entirely of trivial ops. The all-Cast partition was therefore dropped (0 nodes on CoreML), failing the assertion on the arm64 CoreML runner. Append a non-trivial Sqrt to the graph (via a new append_nontrivial flag on MakeCastBoolModelData) so the partition is retained, letting the test assert that both bool Casts are actually claimed by the EP. The NeuralNetwork negative test keeps using the pure all-Cast graph (default flag) and is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # onnxruntime/test/providers/coreml/coreml_basic_test.cc

Drop the cross-PR references and internal-implementation context from the MakeCastBoolModelData / CastBoolMLProgramPartition comments; keep just the self-contained explanation of the append_nontrivial flag and the partition assertion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

This was referenced May 20, 2026

[CoreML EP] Add Sin and Cos unary ops #28596

Merged

[CoreML EP] Add Where and And builders #28597

Open

[CoreML EP] Add GatherND builder #28598

Open

maxwbuckley and others added 2 commits May 21, 2026 09:34

Merge remote-tracking branch 'origin/main' into coreml-cast-bool

56ce3ca

maxwbuckley marked this pull request as ready for review May 21, 2026 13:24

yuslepukhin requested review from Copilot and wejoncy May 25, 2026 20:40

Copilot started reviewing on behalf of yuslepukhin May 25, 2026 20:41 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/coreml/builders/impl/cast_op_builder.cc

Merge branch 'main' into coreml-cast-bool

6406567

Resolves conflict in coreml_basic_test.cc by keeping both the new bool-Cast NeuralNetwork-negative test and the upstream Gather test additions.

yuslepukhin reviewed May 26, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/coreml/coreml_basic_test.cc Outdated

maxwbuckley and others added 3 commits May 29, 2026 21:25

Merge remote-tracking branch 'origin/main' into coreml-cast-bool

2a3526a

# Conflicts: # onnxruntime/test/providers/coreml/coreml_basic_test.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CoreML EP] Support bool Cast in ML Program#28595

[CoreML EP] Support bool Cast in ML Program#28595
maxwbuckley wants to merge 8 commits into
microsoft:mainfrom
maxwbuckley:coreml-cast-bool

maxwbuckley commented May 20, 2026 •

edited

Loading

Uh oh!

maxwbuckley commented May 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

wejoncy commented May 26, 2026

Uh oh!

maxwbuckley commented May 26, 2026

Uh oh!

wejoncy commented May 26, 2026

Uh oh!

Uh oh!

yuslepukhin commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

maxwbuckley commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Combined impact of the series

Tests (coreml_basic_test.cc)

Series — CoreML EP coverage for transformer / diffusion graphs

Uh oh!

maxwbuckley commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

wejoncy commented May 26, 2026

Uh oh!

maxwbuckley commented May 26, 2026

Uh oh!

wejoncy commented May 26, 2026

Uh oh!

Uh oh!

yuslepukhin commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maxwbuckley commented May 20, 2026 •

edited

Loading

Tests (`coreml_basic_test.cc`)

maxwbuckley commented May 22, 2026 •

edited

Loading

yuslepukhin commented May 27, 2026 •

edited

Loading