[QUESTION] Need some clarification regarding tensor parallelism on Multimodal example #1197
Unanswered
luisfrentzen
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi I am new and trying to learn how to use the framework. I am currently following the multimodal guide in the examples section. I have some questions that I need clarification on. I hope it is okay and I am sorry if the answer should've been obvious.
While converting checkpoint to megatron format with the guide on this page docs/llama_mistral.md, it says that the TP parameter must be set correctly. For example, Llama-3-70B must have TP = 8. Where does that number come from? What if I want to use TP less/more than 8 for the model Llama-3-70B? Would that work?

Does having target TP = 4 mean that the model weight must be shared between no more and no less than 4 GPUs?
What is the difference between tensor parallelism and pipeline parallelism argument?
It would be great if someone could provide a bit of explanation. Please mention if you think there is any resource that would answer these questions as well. Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions