Skip to content

metal : fuse add, mul #14596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 18, 2025
Merged

metal : fuse add, mul #14596

merged 1 commit into from
Jul 18, 2025

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Jul 9, 2025

target #14629

Fuse GGML_OP_ADD and GGML_OP_MUL

LLAMA_SET_ROWS=1 ./scripts/compare-commits.sh master gg/metal-fuse-add -m ./models/qwen3-30b-a3b/ggml-model-q8_0.gguf -m models/gemma-3-4b/ggml-model-q8_0.gguf -fa 1 -t 1
Model Test t/s master t/s gg/metal-fuse-add Speedup
gemma3 4B Q8_0 pp512 2444.84 2494.63 1.02
gemma3 4B Q8_0 tg128 90.39 96.76 1.07
qwen3moe 30B.A3B Q8_0 pp512 1362.92 1420.74 1.04
qwen3moe 30B.A3B Q8_0 tg128 70.12 76.68 1.09

Testing

make -j && GGML_METAL_FUSION_DEBUG=2 ./bin/test-backend-ops -o RMS_NORM_MUL_ADD -b Metal
Backend 1/3: Metal
  Device description: Apple M4 Max
  Device memory: 28753 MB (28747 MB free)

ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000001): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000100): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.100000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=1.000000): OK
  6543/6543 tests passed
  Backend Metal: OK
ggml_backend_metal_device_rel: fused ADD: 5
ggml_backend_metal_device_rel: fused MUL: 5

  • Disable with env variable
  • Print fuse stats
  • Fuse with norms, cpys, etc.
  • Cleaner kernel impl?

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jul 9, 2025
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 23bc8a3 to b61796c Compare July 11, 2025 11:05
@ggerganov ggerganov changed the base branch from master to gg/graph-context-refactor July 11, 2025 11:05
@ggerganov ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 5a220cc to bc0a20c Compare July 12, 2025 19:51
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch 3 times, most recently from 6e07c3e to 067d04a Compare July 13, 2025 19:11
@github-actions github-actions bot added the testing Everything test related label Jul 13, 2025
@ggerganov ggerganov marked this pull request as ready for review July 14, 2025 10:28
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch from fc3a162 to 474041f Compare July 14, 2025 10:35
@ggerganov ggerganov changed the title metal : fuse add metal : fuse add, mul Jul 14, 2025
@ggerganov ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 20010c4 to ae2fb57 Compare July 18, 2025 05:00
Base automatically changed from gg/graph-context-refactor to master July 18, 2025 05:29
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 012fb71 to 04d0349 Compare July 18, 2025 11:39
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch from 04d0349 to effa72e Compare July 18, 2025 11:46
@ggerganov ggerganov merged commit bf9087f into master Jul 18, 2025
53 of 55 checks passed
@ggerganov ggerganov deleted the gg/metal-fuse-add branch July 18, 2025 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants