Replies: 1 comment 2 replies
-
|
What's your hardware / OS? To reproduce those numbers you need the latest MLX (0.30.0) on macOS 26.2 (beta release) on the M5. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey all,
I came across this article: https://machinelearning.apple.com/research/exploring-llms-mlx-m5, where Apple claims to have achieved 2.87 sec TTFT on the
MacBook Pro M5-24GBfor theGPT-OSS-20B-MXFP4-Q4model using MLX. However, I can’t seem to replicate those numbers — I’m getting a TTFT of ~8 sec.Note: None of the models listed in the article are performing as claimed.
Here’s my benchmarking setup:
mlx_lm/generate.pyscript. Here’s the PR containing those changes: https://github.com/ml-explore/mlx-lm/pull/633/filesIt would be great if anyone has observed similar or different results and could share their setup here. Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions