metal : fuse add, mul #14596

ggerganov · 2025-07-09T14:06:45Z

Fuse GGML_OP_ADD and GGML_OP_MUL

LLAMA_SET_ROWS=1 ./scripts/compare-commits.sh master gg/metal-fuse-add -m ./models/qwen3-30b-a3b/ggml-model-q8_0.gguf -m models/gemma-3-4b/ggml-model-q8_0.gguf -fa 1 -t 1

Model	Test	t/s master	t/s gg/metal-fuse-add	Speedup
gemma3 4B Q8_0	pp512	2444.84	2494.63	1.02
gemma3 4B Q8_0	tg128	90.39	96.76	1.07
qwen3moe 30B.A3B Q8_0	pp512	1362.92	1420.74	1.04
qwen3moe 30B.A3B Q8_0	tg128	70.12	76.68	1.09

Testing

make -j && GGML_METAL_FUSION_DEBUG=2 ./bin/test-backend-ops -o RMS_NORM_MUL_ADD -b Metal

Backend 1/3: Metal
  Device description: Apple M4 Max
  Device memory: 28753 MB (28747 MB free)

ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000001): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000100): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.100000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=1.000000): OK
  6543/6543 tests passed
  Backend Metal: OK
ggml_backend_metal_device_rel: fused ADD: 5
ggml_backend_metal_device_rel: fused MUL: 5

Disable with env variable
Print fuse stats
Fuse with norms, cpys, etc.
Cleaner kernel impl?

ggml/src/ggml-metal/ggml-metal.m

ggml-ci

slaren reviewed Jul 9, 2025

View reviewed changes

ggml/src/ggml-metal/ggml-metal.m Outdated Show resolved Hide resolved

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jul 9, 2025

ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 23bc8a3 to b61796c Compare July 11, 2025 11:05

ggerganov changed the base branch from master to gg/graph-context-refactor July 11, 2025 11:05

ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 5a220cc to bc0a20c Compare July 12, 2025 19:51

ggerganov force-pushed the gg/metal-fuse-add branch 3 times, most recently from 6e07c3e to 067d04a Compare July 13, 2025 19:11

github-actions bot added the testing Everything test related label Jul 13, 2025

ggerganov marked this pull request as ready for review July 14, 2025 10:28

ggerganov force-pushed the gg/metal-fuse-add branch from fc3a162 to 474041f Compare July 14, 2025 10:35

ggerganov changed the title ~~metal : fuse add~~ metal : fuse add, mul Jul 14, 2025

ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 20010c4 to ae2fb57 Compare July 18, 2025 05:00

Base automatically changed from gg/graph-context-refactor to master July 18, 2025 05:29

ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 012fb71 to 04d0349 Compare July 18, 2025 11:39

metal : fuse add, mul + add tests

effa72e

ggml-ci

ggerganov force-pushed the gg/metal-fuse-add branch from 04d0349 to effa72e Compare July 18, 2025 11:46

ggerganov merged commit bf9087f into master Jul 18, 2025
53 of 55 checks passed

ggerganov deleted the gg/metal-fuse-add branch July 18, 2025 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : fuse add, mul #14596

metal : fuse add, mul #14596

ggerganov commented Jul 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

metal : fuse add, mul #14596

metal : fuse add, mul #14596

Conversation

ggerganov commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jul 9, 2025 •

edited

Loading