Add Metal paged attention minimal support (flagged) and SDPA binding fix (OPTIONAL fast path for Continuous Batching PR) #2814

Sohailm25 · 2025-11-21T20:37:55Z

Proposed changes

Add minimal Metal paged attention support with a Python fallback and keep SDPA as the default. This enables downstream consumers (e.g., mlx-lm) to opt into paged attention without changing behavior for
existing users.
Align SDPA bindings with current mlx main (add is_training / output_logsumexp, restore VJP / is_equivalent symbols) to fix build/link drift.
Tests cover paged KV allocation/copy, fallback parity, and SDPA fast-path (python/tests/test_paged_kv.py, python/tests/test_fast.py).

I have read the CONTRIBUTING (https://github.com/ml-explore/mlx/blob/main/CONTRIBUTING.md) document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

Sohailm25 · 2025-11-21T20:38:37Z

Sohailm25 added 2 commits November 21, 2025 10:23

Add Metal paged attention kernel and Python fallback

54e3020

Align SDPA call with updated signature and add SDPA VJP symbol

cf4b6da

Sohailm25 mentioned this pull request Nov 21, 2025

Metal Kernels for Paged Attention & Batched KV Writes #2760

Closed

4 tasks