Releases · modelscope/mcore-bridge · GitHub

21 Jun 18:33

Jintao-Huang

Patch release v1.5.1 Latest

Latest

Full Changelog: v1.5.0...v1.5.1

Assets 2

16 Jun 15:38

Jintao-Huang

v1.5.0

新特性

generative_reranker 任务训练 lm_head 部分显存占用优化，只提取 positive / negative token 位置的 logits 而不是完整的 logits。
不再兼容 megatron-core 0.15。
修复若干 Bugs。

New Features

Optimized GPU memory usage for the generative_reranker task during training of the lm_head component, by extracting logits only at positive/negative token positions instead of computing the full logits.
Dropped compatibility with megatron-core 0.15.
Fixed several bugs.

What's Changed

optimize generative_reranker memory by @Jintao-Huang in #115
update mm_gpt_model input_ids by @Jintao-Huang in #117
Fix NPU LoRA for MindSpeed MoE grouped linear by @addsubmuldiv in #118
[bugfix] fix embedding sharded_state_dict by @Jintao-Huang in #119
update mcore requirements by @Jintao-Huang in #120
update requirements by @Jintao-Huang in #122
update requirements by @Jintao-Huang in #123
[bugfix] fix linear_decoupled_in_proj CP by @Jintao-Huang in #124

Full Changelog: v1.4.3...v1.5.0

Contributors

addsubmuldiv and Jintao-Huang

Assets 2

07 Jun 14:57

Jintao-Huang

v1.4.3

新特性

新增 model_type 支持：gemma4_unified；kimi_k25 新增多模态支持。
新增 language_model_only 参数，启用后仅创建语言模型部分，并只加载与保存语言模型相关权重。
修复若干 Bug。

New Features

Added model_type support for gemma4_unified; added multimodal support for kimi_k25.
Added language_model_only parameter, which when enabled, only creates the language model component and exclusively loads/saves language model weights.
Fixed several bugs.

What's Changed

[bugfix] fix: clamp num_tokens=0 in MTP loss & add normalized scale for MTP per token loss by @YaoweiFan in #104
[bugfix] fix tie_word_embeddings by @Jintao-Huang in #105
[bugfix] fix deepseek-v4 dev branch by @Jintao-Huang in #107
[model] support gemma4_unified by @Jintao-Huang in #108
update batch_p2p_comm by @Jintao-Huang in #111
support language_model_only by @Jintao-Huang in #112
support kimi_k25 mm by @Jintao-Huang in #113
update mla rope mcore>=0.18 (0.15-0.18 compat) by @Jintao-Huang in #114

New Contributors

@YaoweiFan made their first contribution in #104

Full Changelog: v1.4.2...v1.4.3

Contributors

Jintao-Huang and YaoweiFan

Assets 2

31 May 12:05

Jintao-Huang

v1.4.2

新特性

新增 model_type 支持：bailing_hybrid。
修复 olmoe/bailing_moe 在TP > 1时的损失异常。

New Features

Add model_type support: bailing_hybrid.
Fix abnormal loss for olmoe/bailing_moe when TP > 1.

What's Changed

[bugfix] fix bug by @Jintao-Huang in #99
[bugfix] fix qwen3_next norm sp by @Jintao-Huang in #100
[model] Support bailing_hybrid by @Jintao-Huang in #85
refactor olmoe by @Jintao-Huang in #101
[bugfix] fix npu GDN by @Jintao-Huang in #103

Full Changelog: v1.4.1...v1.4.2

Contributors

Jintao-Huang

Assets 2

27 May 15:23

Jintao-Huang

v1.4.1

中文版

新特性

新增 model_type 支持：gemma4、deepseek_v4。
README 新增使用 Mcore-Bridge 创建模型并执行 forward、计算损失的最简示例。
兼容 megatron-core main 与 dev 分支。

English Version

New Features

Added model_type support for: gemma4, deepseek_v4.
Added a minimal example in README demonstrating how to create a model using Mcore-Bridge to perform forward pass and compute loss.
Compatible with both megatron-core main and dev branches.

What's Changed

[model] Support gemma4 by @Jintao-Huang in #56
[docs] update readme by @Jintao-Huang in #84
compat megatron dev branch by @Jintao-Huang in #87
[model] support gemma4 padding_free by @Jintao-Huang in #88
[docs] update docs by @Jintao-Huang in #89
update gemma4 rope by @Jintao-Huang in #90
refactor MLA by @Jintao-Huang in #91
compat mtp megatron_core main branch by @Jintao-Huang in #92
[model] Support deepseek-v4 by @Jintao-Huang in #86
[bugfix] fix bugs by @Jintao-Huang in #95
[model] support deepseek v4 mtp by @Jintao-Huang in #93
Support fp4 blockwise load by @Jintao-Huang in #96
[bugfix] fix gdn conv1d by @Jintao-Huang in #97
update lora add by @Jintao-Huang in #98

Full Changelog: v1.4.0...v1.4.1

Contributors

Jintao-Huang

Assets 2

17 May 15:50

Jintao-Huang

v1.4.0

中文版

新特性

新增 model_type 支持：bailing_moe、qwen3_asr。
支持 Qwen3-Next 以 Mcore-GDN 方式运行（默认），从而支持序列 packing、FP8 及 CP。
对 transformer_block / transformer_layer 进行重构，通过可继承的方式便于新模型的接入。
兼容 Python 3.13。
支持 transformers 中以 grouped 方式组织专家的 MoE 模型的 LoRA 权重存储与读取。（注意：该 LoRA 权重不支持通过 transformers 直接加载，但可通过 Megatron 加载以用于后续继续训练。）
新增 padding_mask 支持，修复了在 padding_free=False 时，moe_aux_loss 对 padding token 错误计算 routing loss 的问题。

English Version

New Features

Added model_type support for bailing_moe and qwen3_asr.
Support running Qwen3-Next with Mcore-GDN (default), enabling sequence packing, FP8, and CP.
Refactored transformer_block / transformer_layer with an inheritable design to simplify the integration of new models.
Added compatibility with Python 3.13.
Support LoRA weight saving and loading for MoE models whose experts are organized in grouped mode in transformers. (Note: these LoRA weights cannot be loaded directly via transformers, but can be loaded via Megatron for continued training.)
Added padding_mask support, fixing an issue where moe_aux_loss incorrectly computed routing loss on padding tokens when padding_free=False.

What's Changed

[bugfix] fix MTP & mcore 0.15 (NPU) by @Jintao-Huang in #67
compat python 3.13 by @Jintao-Huang in #68
compat lint py313 by @Jintao-Huang in #69
compat lint py3.13 by @Jintao-Huang in #70
[model] support bailing by @Jintao-Huang in #55
update gpt_model by @Jintao-Huang in #71
refactor transformer_block by @Jintao-Huang in #72
[bugfix] fix tie_word_embeddings by @Jintao-Huang in #74
[bugfix] fix qwen3_vl by @Jintao-Huang in #73
remove hf_grouped lora error by @Jintao-Huang in #75
[model] support qwen3_next gdn by @Jintao-Huang in #76
compat megatron.core 0.18 by @Jintao-Huang in #77
[model] support qwen3_asr by @Jintao-Huang in #78
Support padding mask by @Jintao-Huang in #79
compat peft 0.19 by @Jintao-Huang in #80
[readme] Update readme by @Jintao-Huang in #81
[docs] update readme by @Jintao-Huang in #82
[bugfix] fix minimax qk_norm sp by @Jintao-Huang in #83

Full Changelog: v1.3.0...v1.4.0

Contributors

Jintao-Huang

Assets 2

12 May 14:41

Jintao-Huang

Patch release v1.3.2

Full Changelog: v1.3.1...v1.3.2

Assets 2

10 May 05:29

Jintao-Huang

Patch release v1.3.1

Full Changelog: v1.3.0...v1.3.1

Assets 2

07 May 02:51

Jintao-Huang

v1.3.0

中文版

新特性

新增 model_type 支持：kimi_k25、hy_v3、llava_onevision。
mlp_padding_free 兼容 Sequence Parallelism。
移除对 megatron-core 0.12 - 0.14 版本的依赖支持。

English Version

New Features

Added model_type support: kimi_k25, hy_v3, llava_onevision.
mlp_padding_free is now compatible with Sequence Parallelism.
Removed dependency support for megatron-core versions 0.12 - 0.14.

What's Changed

[docs] update readme by @Jintao-Huang in #49
update requirements by @Jintao-Huang in #51
npu qwen3.5 megatron padding_free fix by @addsubmuldiv in #50
[model] support kimi_k25 by @Jintao-Huang in #52
[model] support hy_v3 by @Jintao-Huang in #53
Add support for LLaVA-OneVision-1.5 model by @randydl in #54
[bugfix] fix torch_dtype by @Jintao-Huang in #57
fix qwen3_next by @Jintao-Huang in #58
remove mcore0.12-mcore0.14 by @Jintao-Huang in #59
fix kwargs by @Jintao-Huang in #61
[megatron] support mlp_padding_free & sp; refactor TransformerLayer by @Jintao-Huang in #62
[bugfix] fix gather_from_sp by @Jintao-Huang in #63
update transformers by @Jintao-Huang in #65
update requirements by @Jintao-Huang in #66

New Contributors

@randydl made their first contribution in #54

Full Changelog: v1.2.0...v1.3.0

Contributors

addsubmuldiv, randydl, and Jintao-Huang

Assets 2

05 May 13:51

Jintao-Huang

Patch release v1.2.3

Full Changelog: v1.2.2...v1.2.3

Assets 2