Skip to content

Commit a4c6470

Browse files
authored
benchmarks (#552)
1 parent 1d11449 commit a4c6470

File tree

2 files changed

+71
-3
lines changed

2 files changed

+71
-3
lines changed

mlx_lm/BENCHMARKS.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Benchmarks
2+
3+
## Commands
4+
5+
The command for evaluating on MMLU Pro:
6+
7+
```
8+
mlx_lm.evaluate --model model/repo --task mmlu_pro
9+
```
10+
11+
The command for efficiency benchmarks:
12+
13+
```
14+
mlx_lm.benchmark --model model/repo -p 2048 -g 128
15+
```
16+
17+
To get the package versions run:
18+
19+
```
20+
python -m mlx --version && python -m mlx_lm --version
21+
```
22+
23+
## Models
24+
25+
<details>
26+
27+
<summary> Qwen/Qwen3-4B-Instruct-2507 </summary>
28+
29+
Precision | MMLU Pro | Prompt (2048) tok/sec | Generation (128) tok/sec | Memory GB | Repo
30+
--------- | -------- | ------------------- | ------------------------ | --------- | ----
31+
bf16 | 64.05 | 1780.63 | 52.47 | 9.02 | Qwen/Qwen3-4B-Instruct-2507
32+
q8 | 63.85 | 1606.573| 86.907 | 5.254 | mlx-community/Qwen3-4B-Instruct-2507-8bit
33+
q6 | 63.53 | 1576.73 | 104.68 | 4.25 | mlx-community/Qwen3-4B-Instruct-2507-6bit
34+
q5 g32 | 63.16 | 1570.80 | 110.29 | 4.00 | mlx-community/Qwen3-4B-Instruct-2507-5bit-g32
35+
q5 | 62.38 | 1584.33 | 116.39 | 3.86 | mlx-community/Qwen3-4B-Instruct-2507-5bit
36+
q4 g32 | 61.46 | 1610.03 | 126.00 | 3.603 | mlx-community/Qwen3-4B-Instruct-2507-4bit-g32
37+
q4 | 60.72 | 1622.27 | 134.52 | 3.35 | mlx-community/Qwen3-4B-Instruct-2507-4bit
38+
39+
- Performance benchmark on 64GB M4 Max
40+
- mlx 0.29.2.dev20251008+85a8824a8
41+
- mlx-lm 0.28.2
42+
- macOS 26.1
43+
44+
</details>
45+
46+
<details>
47+
<summary> Qwen/Qwen3-30B-A3B-Instruct-2507 </summary>
48+
49+
Precision | MMLU Pro | Prompt (2048) tok/sec | Generation (128) tok/sec | Memory GB | Repo
50+
--------- | -------- | ------------------- | ------------------------ | --------- | ----
51+
bf16 | 72.62 | :skull: | :skull: | :skull: | Qwen/Qwen3-30B-A3B-Instruct-2507
52+
q8 | 72.46 | 1719.47 | 83.16 | 33.46 | mlx-community/Qwen3-30B-A3B-Instruct-2507-8bit
53+
q6 | 72.41 | 1667.45 | 94.14 | 25.82 | mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit
54+
q5 | 71.97 | 1664.24 | 101.00 |22.01 | mlx-community/Qwen3-30B-A3B-Instruct-2507-5bit
55+
q4 | 70.71 | 1753.90 | 113.33 |18.20 | mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit
56+
57+
58+
- Performance benchmarks on 64GB M4 Max
59+
- mlx 0.29.2.dev20251008+85a8824a8
60+
- mlx-lm 0.28.2
61+
- macOS 26.1
62+
63+
</details>

mlx_lm/__main__.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,12 @@
2525
if len(sys.argv) < 2:
2626
raise ValueError(f"CLI requires a subcommand in {subcommands}")
2727
subcommand = sys.argv.pop(1)
28-
if subcommand not in subcommands:
28+
if subcommand in subcommands:
29+
submodule = importlib.import_module(f"mlx_lm.{subcommand}")
30+
submodule.main()
31+
elif subcommand == "--version":
32+
from mlx_lm import __version__
33+
34+
print(__version__)
35+
else:
2936
raise ValueError(f"CLI requires a subcommand in {subcommands}")
30-
submodule = importlib.import_module(f"mlx_lm.{subcommand}")
31-
submodule.main()

0 commit comments

Comments
 (0)