Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline #1192

Hongjie1Chu · 2024-05-17T06:11:38Z

Hongjie1Chu
May 17, 2024

I encountered a problem when using the Megatron pipeline. The function I am using is forward_backward_pipelining_without_interleaving. In this pipeline function, each pipeline stage calls forward_step for the forward pass:

output_tensor = forward_step(forward_step_func, data_iterator, model, input_tensor, losses_reduced)

The input for the forward pass should be the output from the previous stage. However, in the megatron/schedule.py file, the forward_step function is defined as follows:

unwrapped_model.set_input_tensor(input_tensor)
output_tensor, loss_func = forward_step_func(data_iterator, model)

This implies that each stage in the forward pass still gets data from the dataset and processes it, which seems to contradict the concept of pipelining. Could you please explain the rationale behind this design?

code in pretrained_gpy.py:

Here are my results:

My configuration:

GPUS_PER_NODE=4

Change for multinode config

MASTER_ADDR=172.20.20.220
MASTER_PORT=6000
NNODES=1
NODE_RANK=0
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))

DATA_PATH=data/my-gpt2_text_document
CHECKPOINT_PATH=model/model_optim_rng.pt
MODEL_PATH=model/output/pp
DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT"

python -m torch.distributed.launch $DISTRIBUTED_ARGS
pretrain_gpt.py
--tensor-model-parallel-size 1
--pipeline-model-parallel-size 4
--num-layers 12
--hidden-size 1024
--num-attention-heads 16
--micro-batch-size 16
--global-batch-size 64
--seq-length 1024
--max-position-embeddings 1024
--train-iters 1

Feel free to adjust anything as needed before posting!

2024-07-16T18:20:54Z

github-actions[bot]
bot Jul 16, 2024

Marking as stale. No activity in 60 days.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline #1192

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline #1192

Uh oh!

Uh oh!

Hongjie1Chu May 17, 2024

Change for multinode config

Replies: 1 comment

Uh oh!

github-actions[bot] bot Jul 16, 2024

Hongjie1Chu
May 17, 2024

github-actions[bot]
bot Jul 16, 2024