Fix missing MMA flash attention instances in build by volkermauel · Pull Request #16 · volkermauel/ik_llama.cpp

volkermauel · 2025-08-03T09:51:51Z

Summary

make llama-server GGML_CUDA=1 (fails: Could not find compiler "nvcc"; CUDA_DOCKER_ARCH must be set for <11.7)

Include flash attention MMA template instances

5024ebc

volkermauel added the codex label Aug 3, 2025 — with ChatGPT Codex Connector

volkermauel merged commit 8721a3b into main Aug 3, 2025
0 of 2 checks passed

volkermauel deleted the codex/fix-undefined-reference-errors-in-cuda-build branch August 3, 2025 10:41