Add gradient_accumulation_steps to pretrain/train API#743
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
tetelias seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Thanks for the contribution @tetelias! Could you sign the CLA? |
|
@liopeer I signed CLA and corrected failure on one of tests. Do you need to restart workflows approval? |
liopeer
left a comment
There was a problem hiding this comment.
Thanks, that's really high quality work. If all the checks succeed, this is ready to merge!
|
/review |
|
@tetelias It looks like there are still issues with the CLA. Can you quickly check again if it is really signed? |
|
One thought: should Right now it's:
but the effective batch size is really:
Since One caveat though, |
What has changed and why?
Summary
Adds
gradient_accumulation_stepsto the pretrain/train API as a convenience alias for PyTorch Lightning'saccumulate_grad_batches.This makes the pretraining API more consistent with the task-specific training APIs while preserving the existing
trainer_argsescape hatch.Changes
gradient_accumulation_stepsparameterTrainer(accumulate_grad_batches=...)Example
Equivalent to:
Reasoning
This resolves #35
How has it been tested?
In tests/_commands/test_train_helpers.py test_get_trainer was updated, test_get_trainer_gradient_accumulation and test_get_trainer_gradient_accumulation_conflict were added.
All of:
tests
ruff check .
ruff format .
pre-commit run --all-files
pass without errors.
Did you update CHANGELOG.md?
Did you update the documentation?