[QUESTION]How to convert a huggingface checkpoint, and also use PP > 1 or TP > 1 #1094
Unanswered
sambar1729
asked this question in
Q&A
Replies: 1 comment
-
|
Hi @sambar1729 , by any chances, did you find a way to convert a checkpoint from PP=1 TP=1 to PP>1 and/or TP>1 ? I am facing a similar issue right now. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Your question
I want to ingest a checkpoint from HF into Megatron LM and then continue training on that. For the latter part (training) I will need TP > 1 or PP > 1 (Given the model size and the gpu memory i have). For this, when I convert the HF checkpoint to work with Megatron, I need the TP and PP values to match what I need in the training part.
However, right now the conversion scripts from HF to mcore seems to take in PP = 1 and TP = 1 (I am hoping I am mistaken here). How do I use the conversion scripts in
tools/checkpoints/convert.pyso I may be able to use TP > 1 and/or PP > 1?Thanks.
Edit:
I am guessing this is answered (in the negative that no, there is no way currently to do this conversion) by #296 (comment) -- wondering if we have any updates here.
Beta Was this translation helpful? Give feedback.
All reactions