[QUESTION] variable tensor shape when pipeline parallelism (pp) #1072

KookHoiKim · 2024-09-04T07:37:04Z

KookHoiKim
Sep 4, 2024

I am working with LLaVA code and i have a question about sequence length when using pipeline parallelism.
In my understanding, tensor shape for recv, send is fixed using args.seq_length in pipeline_parallel/schedules.py.
And if padding tokens make up most of the input, it becomes very inefficient in terms of memory or execution speed.

Is there any way to use variable input length when using pipeline parallel?
Thanks.

sappenn · 2025-05-12T18:54:50Z

sappenn
May 12, 2025

Any update on this ? Looks like in Megatron's LM pipeline scheduling does not support variable input length as it calculates the send/rcv tensor shapes here. But in practice, pipeline parallelism does not need to have this limitation. If we were to support variable sequence length, how would we do it. Would we look at the data_iterator to peek the first element, get the sequence length and then adjust the rcv/send tensor shapes accordingly. ?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QUESTION] variable tensor shape when pipeline parallelism (pp) #1072

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[QUESTION] variable tensor shape when pipeline parallelism (pp) #1072

Uh oh!

KookHoiKim Sep 4, 2024

Replies: 1 comment

Uh oh!

sappenn May 12, 2025

KookHoiKim
Sep 4, 2024

sappenn
May 12, 2025