Skip to content

Releases: modelscope/mcore-bridge

Patch release v1.5.1

21 Jun 18:33

Choose a tag to compare

v1.5.0

16 Jun 15:38

Choose a tag to compare

新特性

  1. generative_reranker 任务训练 lm_head 部分显存占用优化,只提取 positive / negative token 位置的 logits 而不是完整的 logits。
  2. 不再兼容 megatron-core 0.15。
  3. 修复若干 Bugs。

New Features

  1. Optimized GPU memory usage for the generative_reranker task during training of the lm_head component, by extracting logits only at positive/negative token positions instead of computing the full logits.
  2. Dropped compatibility with megatron-core 0.15.
  3. Fixed several bugs.

What's Changed

Full Changelog: v1.4.3...v1.5.0

v1.4.3

07 Jun 14:57

Choose a tag to compare

新特性

  1. 新增 model_type 支持:gemma4_unified;kimi_k25 新增多模态支持。
  2. 新增 language_model_only 参数,启用后仅创建语言模型部分,并只加载与保存语言模型相关权重。
  3. 修复若干 Bug。

New Features

  1. Added model_type support for gemma4_unified; added multimodal support for kimi_k25.
  2. Added language_model_only parameter, which when enabled, only creates the language model component and exclusively loads/saves language model weights.
  3. Fixed several bugs.

What's Changed

New Contributors

Full Changelog: v1.4.2...v1.4.3

v1.4.2

31 May 12:05

Choose a tag to compare

新特性

  1. 新增 model_type 支持:bailing_hybrid。
  2. 修复 olmoe/bailing_moe 在TP > 1时的损失异常。

New Features

  1. Add model_type support: bailing_hybrid.
  2. Fix abnormal loss for olmoe/bailing_moe when TP > 1.

What's Changed

Full Changelog: v1.4.1...v1.4.2

v1.4.1

27 May 15:23

Choose a tag to compare

中文版

新特性

  1. 新增 model_type 支持:gemma4、deepseek_v4。
  2. README 新增使用 Mcore-Bridge 创建模型并执行 forward、计算损失的最简示例。
  3. 兼容 megatron-core main 与 dev 分支。

English Version

New Features

  1. Added model_type support for: gemma4, deepseek_v4.
  2. Added a minimal example in README demonstrating how to create a model using Mcore-Bridge to perform forward pass and compute loss.
  3. Compatible with both megatron-core main and dev branches.

What's Changed

Full Changelog: v1.4.0...v1.4.1

v1.4.0

17 May 15:50

Choose a tag to compare

中文版

新特性

  1. 新增 model_type 支持:bailing_moeqwen3_asr
  2. 支持 Qwen3-Next 以 Mcore-GDN 方式运行(默认),从而支持序列 packing、FP8 及 CP。
  3. transformer_block / transformer_layer 进行重构,通过可继承的方式便于新模型的接入。
  4. 兼容 Python 3.13。
  5. 支持 transformers 中以 grouped 方式组织专家的 MoE 模型的 LoRA 权重存储与读取。(注意:该 LoRA 权重不支持通过 transformers 直接加载,但可通过 Megatron 加载以用于后续继续训练。)
  6. 新增 padding_mask 支持,修复了在 padding_free=False 时,moe_aux_loss 对 padding token 错误计算 routing loss 的问题。

English Version

New Features

  1. Added model_type support for bailing_moe and qwen3_asr.
  2. Support running Qwen3-Next with Mcore-GDN (default), enabling sequence packing, FP8, and CP.
  3. Refactored transformer_block / transformer_layer with an inheritable design to simplify the integration of new models.
  4. Added compatibility with Python 3.13.
  5. Support LoRA weight saving and loading for MoE models whose experts are organized in grouped mode in transformers. (Note: these LoRA weights cannot be loaded directly via transformers, but can be loaded via Megatron for continued training.)
  6. Added padding_mask support, fixing an issue where moe_aux_loss incorrectly computed routing loss on padding tokens when padding_free=False.

What's Changed

Full Changelog: v1.3.0...v1.4.0

Patch release v1.3.2

12 May 14:41

Choose a tag to compare

Patch release v1.3.1

10 May 05:29

Choose a tag to compare

v1.3.0

07 May 02:51

Choose a tag to compare

中文版

新特性

  1. 新增 model_type 支持:kimi_k25、hy_v3、llava_onevision。
  2. mlp_padding_free 兼容 Sequence Parallelism。
  3. 移除对 megatron-core 0.12 - 0.14 版本的依赖支持。

English Version

New Features

  1. Added model_type support: kimi_k25, hy_v3, llava_onevision.
  2. mlp_padding_free is now compatible with Sequence Parallelism.
  3. Removed dependency support for megatron-core versions 0.12 - 0.14.

What's Changed

New Contributors

Full Changelog: v1.2.0...v1.3.0

Patch release v1.2.3

05 May 13:51

Choose a tag to compare