[QUESTION] variable tensor shape when pipeline parallelism (pp) #1072
Unanswered
KookHoiKim
asked this question in
Q&A
Replies: 1 comment
-
|
Any update on this ? Looks like in Megatron's LM pipeline scheduling does not support variable input length as it calculates the send/rcv tensor shapes here. But in practice, pipeline parallelism does not need to have this limitation. If we were to support variable sequence length, how would we do it. Would we look at the data_iterator to peek the first element, get the sequence length and then adjust the rcv/send tensor shapes accordingly. ? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am working with LLaVA code and i have a question about sequence length when using pipeline parallelism.
In my understanding, tensor shape for recv, send is fixed using args.seq_length in pipeline_parallel/schedules.py.
And if padding tokens make up most of the input, it becomes very inefficient in terms of memory or execution speed.
Is there any way to use variable input length when using pipeline parallel?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions