Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline #1192
Unanswered
Hongjie1Chu
asked this question in
Q&A
Replies: 1 comment
-
|
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I encountered a problem when using the Megatron pipeline. The function I am using is forward_backward_pipelining_without_interleaving. In this pipeline function, each pipeline stage calls forward_step for the forward pass:
output_tensor = forward_step(forward_step_func, data_iterator, model, input_tensor, losses_reduced)
The input for the forward pass should be the output from the previous stage. However, in the megatron/schedule.py file, the forward_step function is defined as follows:
unwrapped_model.set_input_tensor(input_tensor)
output_tensor, loss_func = forward_step_func(data_iterator, model)
This implies that each stage in the forward pass still gets data from the dataset and processes it, which seems to contradict the concept of pipelining. Could you please explain the rationale behind this design?
code in pretrained_gpy.py:
Here are my results:

My configuration:
GPUS_PER_NODE=4
Change for multinode config
MASTER_ADDR=172.20.20.220
MASTER_PORT=6000
NNODES=1
NODE_RANK=0
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))
DATA_PATH=data/my-gpt2_text_document
CHECKPOINT_PATH=model/model_optim_rng.pt
MODEL_PATH=model/output/pp
DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT"
python -m torch.distributed.launch $DISTRIBUTED_ARGS
pretrain_gpt.py
--tensor-model-parallel-size 1
--pipeline-model-parallel-size 4
--num-layers 12
--hidden-size 1024
--num-attention-heads 16
--micro-batch-size 16
--global-batch-size 64
--seq-length 1024
--max-position-embeddings 1024
--train-iters 1
Feel free to adjust anything as needed before posting!
Beta Was this translation helpful? Give feedback.
All reactions