Skip to content

Add gradient_accumulation_steps to pretrain/train API#743

Open
tetelias wants to merge 11 commits into
lightly-ai:mainfrom
tetelias:gradient-accumulation
Open

Add gradient_accumulation_steps to pretrain/train API#743
tetelias wants to merge 11 commits into
lightly-ai:mainfrom
tetelias:gradient-accumulation

Conversation

@tetelias

Copy link
Copy Markdown

What has changed and why?

Summary

Adds gradient_accumulation_steps to the pretrain/train API as a convenience alias for PyTorch Lightning's accumulate_grad_batches.

This makes the pretraining API more consistent with the task-specific training APIs while preserving the existing trainer_args escape hatch.

Changes

  • add gradient_accumulation_steps parameter
  • map to Trainer(accumulate_grad_batches=...)
  • add validation/conflict handling
  • add tests

Example

lightly_train.pretrain(
    ...,
    batch_size=8,
    gradient_accumulation_steps=4,
)

Equivalent to:

lightly_train.pretrain(
    ...,
    trainer_args={
        "accumulate_grad_batches": 4,
    },
)

Reasoning

This resolves #35

How has it been tested?

In tests/_commands/test_train_helpers.py test_get_trainer was updated, test_get_trainer_gradient_accumulation and test_get_trainer_gradient_accumulation_conflict were added.
All of:
tests
ruff check .
ruff format .
pre-commit run --all-files
pass without errors.

Did you update CHANGELOG.md?

  • Yes
  • Not needed (internal change)

Did you update the documentation?

  • Yes
  • Not needed (internal change without effects for user)

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@CLAassistant

CLAassistant commented May 25, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ liopeer
❌ tetelias


tetelias seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@liopeer

liopeer commented May 25, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution @tetelias! Could you sign the CLA?

@tetelias

Copy link
Copy Markdown
Author

@liopeer I signed CLA and corrected failure on one of tests. Do you need to restart workflows approval?

@liopeer liopeer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's really high quality work. If all the checks succeed, this is ready to merge!

@liopeer

liopeer commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

/review

@liopeer liopeer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@liopeer

liopeer commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@tetelias It looks like there are still issues with the CLA. Can you quickly check again if it is really signed?

@DLemming

DLemming commented Jun 25, 2026

Copy link
Copy Markdown

One thought: should accumulate_grad_batches also be included when computing global_batch_size?

Right now it's:

global_batch_size = args.batch_size * args.devices

but the effective batch size is really:

effective_batch_size = args.batch_size * args.devices * args.acc_grad_batches

Since global_batch_size is used for automatic LR scaling, exposing accumulate_grad_batches without accounting for it means users get a larger effective batch size but the LR is still scaled for the smaller one. As lightly-train is a high-level, worry-free wrapper around lightly, as a user I would expect to be accounted for correct learning rate scaling when simulating larger effective batch sizes.

One caveat though, global_batch_size is reused for non-LR purposes (e.g. steps_per_epoch = dataset_size // self.global_batch_size in dinov2.py:675 or throughput logging). So multiplying globally will have side effects and it's probably safer to pass through gradient_accumulation_steps to the lr_scale line specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] Gradient accumulation

4 participants