Replies: 2 comments
-
|
Flash attention added a deterministic flag since v2.4. For FA version >= 2.4, |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Is nccl algo deterministic? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Issue Description:
I read the information about reproducibility, which mentions using
--deterministic-modeby settingNCCL_ALGO,NVTE_ALLOW_NONDETERMINISTIC_ALGO=0, and not using--use-flash-attnto achieve deterministic training.I tested Megatron with dual-node (TP=2, PP=2) setups using eight A800 GPUs each, training for 50 iterations. I used this configuration for multiple runs and checked whether the saved models were identical each time (comparing parameters one by one). I found that setting
NVTE_ALLOW_NONDETERMINISTIC_ALGO=0alone ensured identical model parameters across runs. It seems only this setting matters for reproducibility in my tests. Conversely, not setting this environment variable resulted in different model parameters being saved after each run.Questions:
NCCL_ALGOand--use-flash-attncause non-deterministic training results?NCCL_ALGOdefaults to None. In this case, how does NCCL choose the algorithm, and how can I know which algorithm is being selected?Environment Details:
NVTE_ALLOW_NONDETERMINISTIC_ALGO=0Thank you for your assistance.
Beta Was this translation helpful? Give feedback.
All reactions