Replies: 3 comments 1 reply
-
|
arguments: throughput log: |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Anyone can help? Thanks |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
I am trying to run the 70B on 16 GPUs but I keep getting OOM errors? How did you manage to do it? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Your question
Machine: 2 nodes * 8 A100
TP=8
PP=2
DP=1
CP=1
seq_length=4096
micro_batch_size=1
global_batch_size=1
enable recompute activation, flash attention, distribute optimizer
Megatron version: core_v0.7.0
Thanks for you help!
Beta Was this translation helpful? Give feedback.
All reactions