Releases: ml-explore/mlx-lm
Releases · ml-explore/mlx-lm
v0.28.3
What's Changed
- Removing the deprecated wandb params by @Goekdeniz-Guelmez in #524
- Memory efficient ssm by @awni in #525
- Fix lora MoEs by @awni in #522
- Fix bailing moe by @awni in #521
- GPT2 Batching Fix by @shepardxia in #529
- Remove act loss and add temp in DWQ by @awni in #500
- Cleanup and simplify model I/O by @awni in #532
- Fix: Add future annotations import to qwen3_next.py for Python 3.9 compatibility by @mzau in #533
- Add lfm2 moe by @Blaizzy in #537
- Fix example command to quantize a model using GPTQ by @felladrin in #539
- minor typing issues by @mercush in #540
- Fix cuda install by @awni in #542
- Add Qwen3-VL language model implementation by @vincentamato in #547
- Fix mask for batched SSM by @awni in #546
- Added gradient accumulation to training loop by @dotvignesh in #511
- Support data parallel eval for generation tasks by @awni in #549
- Optimize Bailing MoE by @kernelpool in #550
- Adding jamba by @Goekdeniz-Guelmez in #544
- Add Qwen3-VL (Dense) language model implementation by @vincentamato in #553
- LLM Benchmarks by @awni in #552
- Add support for nanochat by @dnakov in #554
- version by @awni in #558
New Contributors
- @shepardxia made their first contribution in #529
- @mzau made their first contribution in #533
- @felladrin made their first contribution in #539
- @mercush made their first contribution in #540
- @dotvignesh made their first contribution in #511
Full Changelog: v0.28.2...v0.28.3
v0.28.2
What's Changed
- fix bailing moe by @awni in #514
- Fix batching for models with nested cache structures by @kernelpool in #510
- Fix: Correct weight masking for zero-computation experts in LongCat Flash MoE by @kernelpool in #508
- Simplify to_lora to not hardcode model types by @awni in #515
- Add Olmo3 by @Goekdeniz-Guelmez in #445
- Make mixed quantization affect attention in DeepSeek V3, others by @n8sh1 in #506
- Add Apriel 1.5 by @ivanfioravanti in #520
- feat: Refactor granitemoehybrid to support dense and non-hybrid variants by @gabe-l-hart in #518
New Contributors
- @kernelpool made their first contribution in #510
Full Changelog: v0.28.1...v0.28.2
v0.28.1
What's Changed
- Fix quant predicate by @awni in #485
- Fix passing argument model_config in utils.load() by @ariahw in #494
- Fix KV cache quantization for hybrid models by @awni in #495
- fix for LFM2 by @awni in #493
- fix loading for qwen2 VL by @awni in #491
- Enable training for qwen3 next by @awni in #496
- Add batch support for sliding window cache by @awni in #487
- qwen3 next batching by @awni in #478
- Add Falcon H1 by @Blaizzy in #231
- Fix RotatingKVCache update by @awni in #503
- Add Code World Model support by @dnakov in #505
- Use depends in pipeline parallel by @awni in #483
New Contributors
Full Changelog: v0.28.0...v0.28.1
v0.28.0
What's Changed
- Allow fp8 by @awni in #431
- Avoid cache-trimming crash in server for longcat chat and baichuan_m1 by @n8sh1 in #434
- Fix hunyuan v1 dense by @awni in #440
- Changes needed to facilitate batching by @awni in #430
- remove manual conv class in mamba1 by @Goekdeniz-Guelmez in #436
- adding Kwai-Klear/Klear-46B-A2.5B-Instruct by @Goekdeniz-Guelmez in #437
- Add lille 130m by @Goekdeniz-Guelmez in #429
- model: GraniteMoeHybrid by @gabe-l-hart in #442
- fix server paths by @awni in #448
- sdpa with sinks by @awni in #418
- fix(quantization): Parameterize hardcoded group_size in mixed_quant_predicate_builder by @squaredice in #449
- Adding Ling Mini by @Goekdeniz-Guelmez in #450
- Adding Qwen3 Next by @Goekdeniz-Guelmez in #441
- Faster ssm by @awni in #451
- Update bitnet, nemotron h to use build in relu2 from MLX by @Goekdeniz-Guelmez in #446
- fix qwen3 next by @Goekdeniz-Guelmez in #453
- Adding GLM by @Goekdeniz-Guelmez in #457
- Add an introduction to the default LLM in README.md by @aopstudio in #461
- Fix
TypeError: Model.__call__() got an unexpected keyword argument 'mask'for qwen2_vl, mistral3 by @neilmehta24 in #464 - Add groups to ssm kernel and update more models by @awni in #456
- Fix gemma3 window mask by @awni in #465
- Batch generation by @awni in #443
- Batch support for mamba-style models by @awni in #468
- fix: handle cache offset safely for mamba error by @ivanfioravanti in #472
- Adds LLaMA 4 text model implementation in MLX by @robbiemu in #469
- Allow sampler to work with batched_generate by @N8python in #473
- Adding support for mamba2 by @Goekdeniz-Guelmez in #392
- Fix llama4 text and make trainable by @Goekdeniz-Guelmez in #474
- Extends quantization predicate with config by @robbiemu in #476
- Gated-Delta Fused Kernel (Qwen3Next) by @ivanfioravanti in #454
New Contributors
- @gabe-l-hart made their first contribution in #442
- @squaredice made their first contribution in #449
- @aopstudio made their first contribution in #461
- @robbiemu made their first contribution in #469
Full Changelog: v0.27.1...v0.28.0
v0.27.1
What's Changed
- Fix mlx_lm.perplexity seed w/ np.random.seed to ensure determinism across runs by @N8python in #415
- server allow specifying seed by @n8sh1 in #414
- Adding ibm Granite MoE by @Goekdeniz-Guelmez in #413
- had to add self.args and self.model_type into the model class for mlx-lm-lore by @Goekdeniz-Guelmez in #422
- add Apertus from Swiss AI by @Goekdeniz-Guelmez in #421
- Add nemotron h by @Goekdeniz-Guelmez in #407
- Adding longcat flash by @Goekdeniz-Guelmez in #423
- Fix Nemotron H loading error by @Goekdeniz-Guelmez in #426
New Contributors
Full Changelog: v0.27.0...v0.27.1
v0.27.0
What's Changed
- Add
mlx_lm.perplexityby @N8python in #397 - Benchmark script by @awni in #396
- Don't reload default model by @awni in #400
- only apply lm_head to the last token by @awni in #406
- Fix prompt cache corruption when generation is interrupted by @dojoteef in #405
- support mxfp4 by @awni in #385
- version by @awni in #410
Full Changelog: v0.26.4...v0.27.0
v0.26.4
What's Changed
- Revert symmetric kl by @awni in #359
- Adding bailing_moe (ling-lite, -plus, -coder) by @Goekdeniz-Guelmez in #369
- Add SwanLab experiment tracking support for MLX by @ShaohonChen in #317
- Fix gpt-oss lora nan by @awni in #370
- Properly tie embeddings and lm head for gemma3 by @awni in #373
- Fix distributed evaluate by @angeloskath in #368
- Add SSE keepalive to stop client disconnects during prompt processing by @dysangel in #362
- Add LFM2-VL model implementation by @christian-lms in #378
- Adding trust_remote_code=True for training by @Goekdeniz-Guelmez in #383
- remove comma and add muon by @Goekdeniz-Guelmez in #381
- add into the lora to layer utils by @Goekdeniz-Guelmez in #382
- Make KL and JS metal kernels only if metal is available by @vsabolcec in #387
- fix sampling with small top k by @awni in #388
- fix window attention mask by @awni in #390
- Add support for ByteDance Seed-OSS-36B-Instruct model by @dnakov in #391
- Add Qwen2-VL model implementation by @vincentamato in #384
New Contributors
- @ShaohonChen made their first contribution in #317
- @dysangel made their first contribution in #362
- @dnakov made their first contribution in #391
- @vincentamato made their first contribution in #384
Full Changelog: v0.26.3...v0.26.4
v0.26.3
What's Changed
- Resolve streaming last token error and correct total token usage by @zenyr in #342
- Fix NameError in loglikelihood_rolling method by @snellingio in #339
- fix error on unsupported response type in server by @emmanuel-ferdman in #344
- Add --trust-remote-code cli option by @dojoteef in #319
- Add validation set for DWQ by @awni in #343
- feat: add --confirm-run-unsafe-code CLI option to allow execution of untrusted code by @ivanfioravanti in #348
- Allow per model quant config by @awni in #349
- Add gpt_oss model by @christian-lms in #354
- Jensen-Shannon divergence loss kernel by @vsabolcec in #352
- Route the gpt_oss to fused sdpa by @angeloskath in #356
- Hunyuan V1 Dense model support by @ivanfioravanti in #351
- Add Additional Features of GPT-OSS Model : Lora, Alternating attention, MoE Support by @Shashikant86 in #357
New Contributors
- @zenyr made their first contribution in #342
- @snellingio made their first contribution in #339
- @emmanuel-ferdman made their first contribution in #344
- @dojoteef made their first contribution in #319
- @vsabolcec made their first contribution in #352
- @Shashikant86 made their first contribution in #357
Full Changelog: v0.26.2...v0.26.3
v0.26.2
What's Changed
- chore: fix gemma3n intermediate_size config by @mzbac in #332
- Add GLM4.5 by @awni in #333
- Changed GLM-4 MoE support for DWQ quantization by @ivanfioravanti in #336
- Add system prompt to chat script by @jussikuosa in #334
- Initialize empty cache when cache=None for Gemma‑3n models by @brchristian in #323
New Contributors
- @mzbac made their first contribution in #332
- @jussikuosa made their first contribution in #334
- @brchristian made their first contribution in #323
Full Changelog: v0.26.1...v0.26.2
v0.26.1
What's Changed
- Add dsv3 for lora by @awni in #284
- GPTQ quantization by @awni in #279
- Fix MoE fine tuning by @awni in #288
- fix hunyuan by @awni in #286
- Allow trust_remote_code in convert.py by @christian-lms in #289
- Type Signature Fixes by @MattBeton in #290
- Fix gemma3n config load bug by @neilmehta24 in #292
- kimi k2 by @awni in #293
- Add LFM2 by @Blaizzy in #291
- feat: DWQ for Hunyuan-A13B-Instruct and trust_remote_code argument by @ivanfioravanti in #303
- fix: update import for huggingface model in evaluate.py by @ivanfioravanti in #275
- Fix ddp workers loading the same data by @angeloskath in #294
- Fix server finish reason by @awni in #307
- Add support for SGD & Adafactor by @N8python in #306
- Allow empty prompt with input_embeddings by @will-lms in #308
- fix naive detokenizer by @awni in #312
- add exaone4 by @awni in #310
- add v1/models/repo_id by @awni in #313
- Update W&B logging crash in MLX-LM-LORA by @Goekdeniz-Guelmez in #316
- Fix DSV3 training by @awni in #324
- Lora works with cuda backend by @awni in #330
- Adding Muon Optimizer by @Goekdeniz-Guelmez in #325
New Contributors
- @christian-lms made their first contribution in #289
- @MattBeton made their first contribution in #290
- @N8python made their first contribution in #306
Full Changelog: v0.26.0...v0.26.1