[QUESTION] What's the internal difference for training when setting only "fp8-format" or setting "fp8-format"+"bf16" #1116
Unanswered
dong-liuliu
asked this question in
Q&A
Replies: 2 comments 2 replies
-
|
Only "fp8-format": FP32 + FP8 You could consider FP8 as an additional feature on top of your current (BF16 or no BF16) training recipe. |
Beta Was this translation helpful? Give feedback.
2 replies
-
|
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Your question
I'm trying to train GPT/LLAMA on top of Megatron-LM, but confused on fp8 performance.
Setting fp8 format parameters together with "--bf16" is much better than the situation without "--bf16". So what's difference between them inside Megatron-LM?
When setting fp8 + bf16, will Megatron-LM try to split some computation to bf16 if more efficient, or to fp8 for high throughtput?
Beta Was this translation helpful? Give feedback.
All reactions