-
-
Notifications
You must be signed in to change notification settings - Fork 3
Faster NMT training for machine.py #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(I'm still working on trying to update transformers to be consistent with silnlp + potential speed-ups). |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #227 +/- ##
==========================================
- Coverage 91.11% 90.96% -0.16%
==========================================
Files 334 334
Lines 21742 21431 -311
==========================================
- Hits 19810 19494 -316
- Misses 1932 1937 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I've updated transformers, so this is ready for review. No appreciable difference with the update. I also updated the settings.yaml in machine.py with parameters that I updated in Serval including: 1) those relevant to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to use the recent update that allows you to disable attention in HuggingFaceNmtModelFactory
. This should allow SDPA to work, which will improve performance. All that should be necessary is to pass output_attentions=False
to the constructor for HuggingFaceNmtEngine
.
Reviewable status: 0 of 5 files reviewed, all discussions resolved
Are you referring to the update Peter made? I.e, I should rebase and re-run and check the train time? I thought that that was only affecting inferencing? Does SDPA also need to be enabled for training separately? |
d3e7fd8
to
30cdddc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does only affect inferencing, but it would still speed up the overall build.
@ddaspit reviewed 2 of 4 files at r1, 1 of 2 files at r2, 1 of 1 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
pyproject.toml
line 32 at r4 (raw file):
extraPaths = ["tests"] reportMissingModuleSource = false reportMissingImports = false
It is usually better to disable this on a specific line.
machine/translation/huggingface/hugging_face_nmt_model_trainer.py
line 270 at r4 (raw file):
# as the first generated token. We ask the user to explicitly provide this as --forced_bos_token argument. forced_bos_token_id = tokenizer.convert_tokens_to_ids(self._tgt_lang) # model.config.forced_bos_token_id = forced_bos_token_id
Why was this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, makes sense. Done.
Reviewable status: 4 of 6 files reviewed, 2 unresolved discussions (waiting on @ddaspit)
pyproject.toml
line 32 at r4 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
It is usually better to disable this on a specific line.
Sorry - this was a temporary change I meant to revert when I was trying to get the debugger (which mysteriously died) to work, I think.
machine/translation/huggingface/hugging_face_nmt_model_trainer.py
line 270 at r4 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Why was this commented out?
There was a warning saying not to set it in the config
, only in generation_config
. But I think that was with different library versions as I was trying to update dependencies. Undone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 6 files reviewed, 2 unresolved discussions (waiting on @ddaspit)
pyproject.toml
line 32 at r4 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Sorry - this was a temporary change I meant to revert when I was trying to get the debugger (which mysteriously died) to work, I think.
Never mind - it was for the macOS CI build 💡. This is what happens when you're working on too many things at once haha. I went ahead and added it in-line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 2 of 2 files at r5, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)
Addresses sillsdev/serval#708 by:
auto_grad_acc=True
from silnlpUsing fewer gradient accumulations steps when possible causes significant speed-ups but only occasionally seems possible given memory constraints.
Only updating the progress each
(max_steps/100)
step causes substantial speed-ups: 1.75hrs versus 3.25hrs. (Doing no progress updating is 1.5hrs). This seemed to be a happy medium - if you think a different compromise is ideal, let me know.Overall, in a training run where auto_grad_acc doesn't do anything other than arrive at the default values:
Numbers are all approximate. They vary by about 5 minutes across multiple runs. Test job was training on a complete NT.
(As you can see, SILNLP is still slightly faster but this may be related to the number of examples because of differences in key terms processing. See sillsdev/serval#751).
This change is