Support LLaMA training in NEOM format #926

youth123 · 2025-11-19T12:20:03Z

PR Category

Train

PR Types

New Features

PR Description

Supports loading and saving checkpoints in nemo zarr format
Supports train packed seqs
Fix the issue where wandb finalization cannot find the latest_checkpointed_iteration file
Fix lora can not support layernorm weight load & not support nemo zarr

The checkpoint file format is as follows：
load zarr format:
-context
-weights
-module.decoder.xxx._extra_state
-module.decoder.xxx.weight
-optimizer.state.fp32_param.xxx.weight
-optimizer.state.fp32_param.xxx.weight.sync
common.pt
meatadata.json

save zarr format：
-iter_xxx
-module.decoder.xxx._extra_state
-module.decoder.xxx.weight
-optimizer.state.fp32_param.xxx.weight
-optimizer.state.fp32_param.xxx.weight.sync
common.pt
meatadata.json
latest_checkpointed_iteration.txt

The comparison of nemo and flagscale under different distributed strategies is as follows：
lora:

dp2:

tp2:

pp2:

full:

dp2：

tp2:

pp2:

… not exists

CLAassistant · 2025-11-19T12:20:10Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

youth123 added 2 commits November 18, 2025 20:00

support nemo llama 70b lora train

c856936

During wandb finalization, the latest_checkpointed_iteration file may…

28e7862

… not exists

youth123 requested review from aoyulong, heavyrain-lzy and zhaoyinglia as code owners November 19, 2025 12:20

add peft config to 70b

61bb788

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support LLaMA training in NEOM format #926

Support LLaMA training in NEOM format #926

youth123 commented Nov 19, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support LLaMA training in NEOM format #926

Are you sure you want to change the base?

Support LLaMA training in NEOM format #926

Conversation

youth123 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

PR Description

Uh oh!

CLAassistant commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

youth123 commented Nov 19, 2025 •

edited

Loading