[QUESTION] Need some clarification regarding tensor parallelism on Multimodal example #1197

luisfrentzen · 2024-10-04T11:34:49Z

luisfrentzen
Oct 4, 2024

Hi I am new and trying to learn how to use the framework. I am currently following the multimodal guide in the examples section. I have some questions that I need clarification on. I hope it is okay and I am sorry if the answer should've been obvious.

While converting checkpoint to megatron format with the guide on this page docs/llama_mistral.md, it says that the TP parameter must be set correctly. For example, Llama-3-70B must have TP = 8. Where does that number come from? What if I want to use TP less/more than 8 for the model Llama-3-70B? Would that work?
Does having target TP = 4 mean that the model weight must be shared between no more and no less than 4 GPUs?
What is the difference between tensor parallelism and pipeline parallelism argument?

It would be great if someone could provide a bit of explanation. Please mention if you think there is any resource that would answer these questions as well. Thank you very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QUESTION] Need some clarification regarding tensor parallelism on Multimodal example #1197

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[QUESTION] Need some clarification regarding tensor parallelism on Multimodal example #1197

Uh oh!

luisfrentzen Oct 4, 2024

Replies: 0 comments

luisfrentzen
Oct 4, 2024