Releases · ml-explore/mlx-lm

17 Oct 21:42

awni

v0.28.3

5e6a7f6

v0.28.3 Latest

Latest

What's Changed

Removing the deprecated wandb params by @Goekdeniz-Guelmez in #524
Memory efficient ssm by @awni in #525
Fix lora MoEs by @awni in #522
Fix bailing moe by @awni in #521
GPT2 Batching Fix by @shepardxia in #529
Remove act loss and add temp in DWQ by @awni in #500
Cleanup and simplify model I/O by @awni in #532
Fix: Add future annotations import to qwen3_next.py for Python 3.9 compatibility by @mzau in #533
Add lfm2 moe by @Blaizzy in #537
Fix example command to quantize a model using GPTQ by @felladrin in #539
minor typing issues by @mercush in #540
Fix cuda install by @awni in #542
Add Qwen3-VL language model implementation by @vincentamato in #547
Fix mask for batched SSM by @awni in #546
Added gradient accumulation to training loop by @dotvignesh in #511
Support data parallel eval for generation tasks by @awni in #549
Optimize Bailing MoE by @kernelpool in #550
Adding jamba by @Goekdeniz-Guelmez in #544
Add Qwen3-VL (Dense) language model implementation by @vincentamato in #553
LLM Benchmarks by @awni in #552
Add support for nanochat by @dnakov in #554
version by @awni in #558

New Contributors

@shepardxia made their first contribution in #529
@mzau made their first contribution in #533
@felladrin made their first contribution in #539
@mercush made their first contribution in #540
@dotvignesh made their first contribution in #511

Full Changelog: v0.28.2...v0.28.3

Contributors

kernelpool, felladrin, and 9 other contributors

Assets 2

02 Oct 14:21

awni

v0.28.2

b264da7

v0.28.2

What's Changed

fix bailing moe by @awni in #514
Fix batching for models with nested cache structures by @kernelpool in #510
Fix: Correct weight masking for zero-computation experts in LongCat Flash MoE by @kernelpool in #508
Simplify to_lora to not hardcode model types by @awni in #515
Add Olmo3 by @Goekdeniz-Guelmez in #445
Make mixed quantization affect attention in DeepSeek V3, others by @n8sh1 in #506
Add Apriel 1.5 by @ivanfioravanti in #520
feat: Refactor granitemoehybrid to support dense and non-hybrid variants by @gabe-l-hart in #518

New Contributors

@kernelpool made their first contribution in #510

Full Changelog: v0.28.1...v0.28.2

Contributors

kernelpool, ivanfioravanti, and 4 other contributors

Assets 2

27 Sep 02:22

awni

v0.28.1

0c0b722

v0.28.1

What's Changed

Fix quant predicate by @awni in #485
Fix passing argument model_config in utils.load() by @ariahw in #494
Fix KV cache quantization for hybrid models by @awni in #495
fix for LFM2 by @awni in #493
fix loading for qwen2 VL by @awni in #491
Enable training for qwen3 next by @awni in #496
Add batch support for sliding window cache by @awni in #487
qwen3 next batching by @awni in #478
Add Falcon H1 by @Blaizzy in #231
Fix RotatingKVCache update by @awni in #503
Add Code World Model support by @dnakov in #505
Use depends in pipeline parallel by @awni in #483

New Contributors

@ariahw made their first contribution in #494

Full Changelog: v0.28.0...v0.28.1

Contributors

awni, dnakov, and 2 other contributors

Assets 2

17 Sep 21:24

awni

v0.28.0

a7f534c

v0.28.0

What's Changed

Allow fp8 by @awni in #431
Avoid cache-trimming crash in server for longcat chat and baichuan_m1 by @n8sh1 in #434
Fix hunyuan v1 dense by @awni in #440
Changes needed to facilitate batching by @awni in #430
remove manual conv class in mamba1 by @Goekdeniz-Guelmez in #436
adding Kwai-Klear/Klear-46B-A2.5B-Instruct by @Goekdeniz-Guelmez in #437
Add lille 130m by @Goekdeniz-Guelmez in #429
model: GraniteMoeHybrid by @gabe-l-hart in #442
fix server paths by @awni in #448
sdpa with sinks by @awni in #418
fix(quantization): Parameterize hardcoded group_size in mixed_quant_predicate_builder by @squaredice in #449
Adding Ling Mini by @Goekdeniz-Guelmez in #450
Adding Qwen3 Next by @Goekdeniz-Guelmez in #441
Faster ssm by @awni in #451
Update bitnet, nemotron h to use build in relu2 from MLX by @Goekdeniz-Guelmez in #446
fix qwen3 next by @Goekdeniz-Guelmez in #453
Adding GLM by @Goekdeniz-Guelmez in #457
Add an introduction to the default LLM in README.md by @aopstudio in #461
Fix TypeError: Model.__call__() got an unexpected keyword argument 'mask' for qwen2_vl, mistral3 by @neilmehta24 in #464
Add groups to ssm kernel and update more models by @awni in #456
Fix gemma3 window mask by @awni in #465
Batch generation by @awni in #443
Batch support for mamba-style models by @awni in #468
fix: handle cache offset safely for mamba error by @ivanfioravanti in #472
Adds LLaMA 4 text model implementation in MLX by @robbiemu in #469
Allow sampler to work with batched_generate by @N8python in #473
Adding support for mamba2 by @Goekdeniz-Guelmez in #392
Fix llama4 text and make trainable by @Goekdeniz-Guelmez in #474
Extends quantization predicate with config by @robbiemu in #476
Gated-Delta Fused Kernel (Qwen3Next) by @ivanfioravanti in #454

New Contributors

@gabe-l-hart made their first contribution in #442
@squaredice made their first contribution in #449
@aopstudio made their first contribution in #461
@robbiemu made their first contribution in #469

Full Changelog: v0.27.1...v0.28.0

Contributors

robbiemu, ivanfioravanti, and 8 other contributors

Assets 2

04 Sep 16:03

awni

v0.27.1

0f26868

v0.27.1

What's Changed

Fix mlx_lm.perplexity seed w/ np.random.seed to ensure determinism across runs by @N8python in #415
server allow specifying seed by @n8sh1 in #414
Adding ibm Granite MoE by @Goekdeniz-Guelmez in #413
had to add self.args and self.model_type into the model class for mlx-lm-lore by @Goekdeniz-Guelmez in #422
add Apertus from Swiss AI by @Goekdeniz-Guelmez in #421
Add nemotron h by @Goekdeniz-Guelmez in #407
Adding longcat flash by @Goekdeniz-Guelmez in #423
Fix Nemotron H loading error by @Goekdeniz-Guelmez in #426

New Contributors

@n8sh1 made their first contribution in #414

Full Changelog: v0.27.0...v0.27.1

Contributors

N8python, Goekdeniz-Guelmez, and n8sh1

Assets 2

29 Aug 17:58

awni

v0.27.0

60320dc

v0.27.0

What's Changed

Add mlx_lm.perplexity by @N8python in #397
Benchmark script by @awni in #396
Don't reload default model by @awni in #400
only apply lm_head to the last token by @awni in #406
Fix prompt cache corruption when generation is interrupted by @dojoteef in #405
support mxfp4 by @awni in #385
version by @awni in #410

Full Changelog: v0.26.4...v0.27.0

Contributors

dojoteef, awni, and N8python

Assets 2

25 Aug 15:54

awni

v0.26.4

cd9884d

v0.26.4

What's Changed

Revert symmetric kl by @awni in #359
Adding bailing_moe (ling-lite, -plus, -coder) by @Goekdeniz-Guelmez in #369
Add SwanLab experiment tracking support for MLX by @ShaohonChen in #317
Fix gpt-oss lora nan by @awni in #370
Properly tie embeddings and lm head for gemma3 by @awni in #373
Fix distributed evaluate by @angeloskath in #368
Add SSE keepalive to stop client disconnects during prompt processing by @dysangel in #362
Add LFM2-VL model implementation by @christian-lms in #378
Adding trust_remote_code=True for training by @Goekdeniz-Guelmez in #383
remove comma and add muon by @Goekdeniz-Guelmez in #381
add into the lora to layer utils by @Goekdeniz-Guelmez in #382
Make KL and JS metal kernels only if metal is available by @vsabolcec in #387
fix sampling with small top k by @awni in #388
fix window attention mask by @awni in #390
Add support for ByteDance Seed-OSS-36B-Instruct model by @dnakov in #391
Add Qwen2-VL model implementation by @vincentamato in #384

New Contributors

@ShaohonChen made their first contribution in #317
@dysangel made their first contribution in #362
@dnakov made their first contribution in #391
@vincentamato made their first contribution in #384

Full Changelog: v0.26.3...v0.26.4

Contributors

dysangel, angeloskath, and 7 other contributors

Assets 2

06 Aug 21:47

awni

v0.26.3

6c876ca

v0.26.3

What's Changed

Resolve streaming last token error and correct total token usage by @zenyr in #342
Fix NameError in loglikelihood_rolling method by @snellingio in #339
fix error on unsupported response type in server by @emmanuel-ferdman in #344
Add --trust-remote-code cli option by @dojoteef in #319
Add validation set for DWQ by @awni in #343
feat: add --confirm-run-unsafe-code CLI option to allow execution of untrusted code by @ivanfioravanti in #348
Allow per model quant config by @awni in #349
Add gpt_oss model by @christian-lms in #354
Jensen-Shannon divergence loss kernel by @vsabolcec in #352
Route the gpt_oss to fused sdpa by @angeloskath in #356
Hunyuan V1 Dense model support by @ivanfioravanti in #351
Add Additional Features of GPT-OSS Model : Lora, Alternating attention, MoE Support by @Shashikant86 in #357

New Contributors

@zenyr made their first contribution in #342
@snellingio made their first contribution in #339
@emmanuel-ferdman made their first contribution in #344
@dojoteef made their first contribution in #319
@vsabolcec made their first contribution in #352
@Shashikant86 made their first contribution in #357

Full Changelog: v0.26.2...v0.26.3

Contributors

Shashikant86, ivanfioravanti, and 8 other contributors

Assets 2

30 Jul 21:24

awni

v0.26.2

e9b1649

v0.26.2

What's Changed

chore: fix gemma3n intermediate_size config by @mzbac in #332
Add GLM4.5 by @awni in #333
Changed GLM-4 MoE support for DWQ quantization by @ivanfioravanti in #336
Add system prompt to chat script by @jussikuosa in #334
Initialize empty cache when cache=None for Gemma‑3n models by @brchristian in #323

New Contributors

@mzbac made their first contribution in #332
@jussikuosa made their first contribution in #334
@brchristian made their first contribution in #323

Full Changelog: v0.26.1...v0.26.2

Contributors

ivanfioravanti, awni, and 3 other contributors

Assets 2

26 Jul 04:38

awni

v0.26.1

a1e16ca

v0.26.1

What's Changed

Add dsv3 for lora by @awni in #284
GPTQ quantization by @awni in #279
Fix MoE fine tuning by @awni in #288
fix hunyuan by @awni in #286
Allow trust_remote_code in convert.py by @christian-lms in #289
Type Signature Fixes by @MattBeton in #290
Fix gemma3n config load bug by @neilmehta24 in #292
kimi k2 by @awni in #293
Add LFM2 by @Blaizzy in #291
feat: DWQ for Hunyuan-A13B-Instruct and trust_remote_code argument by @ivanfioravanti in #303
fix: update import for huggingface model in evaluate.py by @ivanfioravanti in #275
Fix ddp workers loading the same data by @angeloskath in #294
Fix server finish reason by @awni in #307
Add support for SGD & Adafactor by @N8python in #306
Allow empty prompt with input_embeddings by @will-lms in #308
fix naive detokenizer by @awni in #312
add exaone4 by @awni in #310
add v1/models/repo_id by @awni in #313
Update W&B logging crash in MLX-LM-LORA by @Goekdeniz-Guelmez in #316
Fix DSV3 training by @awni in #324
Lora works with cuda backend by @awni in #330
Adding Muon Optimizer by @Goekdeniz-Guelmez in #325

New Contributors

@christian-lms made their first contribution in #289
@MattBeton made their first contribution in #290
@N8python made their first contribution in #306

Full Changelog: v0.26.0...v0.26.1

Contributors

ivanfioravanti, angeloskath, and 8 other contributors

Assets 2

Releases: ml-explore/mlx-lm

v0.28.3

What's Changed

New Contributors

Contributors

Uh oh!

v0.28.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.28.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.28.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.27.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.27.0

What's Changed

Contributors

Uh oh!

v0.26.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.26.3

What's Changed

New Contributors

Contributors

Uh oh!

v0.26.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.26.1

What's Changed

New Contributors

Contributors

Uh oh!