You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your question
I'm using tools/checkpoint/convert.py to convert a llama model to mcore model format for training. The tools/checkpoint/loader_mcore.py support virtual pipeline model loading, but tools/checkpoint/saver_mcore.py doesn't support to save a virtual pipeline model.
Do I have any other way to do this convert? Or do I need to modify saver_mcore.py to support this? Maybe with support for args like target_num_layers_per_virtual_pipeline_stage and target_virtual_pipeline_model_parallel_size?
This discussion was converted from issue #1212 on October 23, 2024 21:04.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Your question
I'm using
tools/checkpoint/convert.pyto convert a llama model to mcore model format for training. Thetools/checkpoint/loader_mcore.pysupport virtual pipeline model loading, buttools/checkpoint/saver_mcore.pydoesn't support to save a virtual pipeline model.Do I have any other way to do this convert? Or do I need to modify
saver_mcore.pyto support this? Maybe with support for args liketarget_num_layers_per_virtual_pipeline_stageandtarget_virtual_pipeline_model_parallel_size?Beta Was this translation helpful? Give feedback.
All reactions