Skip to content

LoRA training QoL improvements: UI progress bar, deterministic seeding, make gradient checkpointing optional #8668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

spacepxl
Copy link
Contributor

Adding the UI progress bar allows users to see the training progress in the UI (obviously) but also makes it possible to cancel training.

Gradient checkpointing, especially with so many checkpoints, is computationally expensive and not necessary if memory isn't a constraint. I left it enabled by default but disabling it is a free speed boost:

got prompt
Added gradient checkpoints to 51 modules
Requested to load BaseModel
0 models unloaded.
loaded completely 9.5367431640625e+25 1656.400333404541 True
Training LoRA: 100%|████████████████████████████████████████████████████| 100/100 [00:50<00:00,  1.99it/s, loss=0.0382]
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:05<00:00,  5.99it/s]
Requested to load AutoencoderKL
loaded completely 21470.762142181396 159.55708122253418 True
Prompt executed in 57.28 seconds

got prompt
Requested to load BaseModel
0 models unloaded.
loaded completely 9.5367431640625e+25 1656.400333404541 True
Training LoRA: 100%|████████████████████████████████████████████████████| 100/100 [00:30<00:00,  3.27it/s, loss=0.0376]
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:05<00:00,  5.98it/s]
Requested to load AutoencoderKL
loaded completely 21470.762142181396 159.55708122253418 True
Prompt executed in 37.95 seconds

As for seeding, I replaced the unused generator and instead temporarily store the global RNG states, seed everything, then restore after training is finished. This seeds the weight initialization without needing to pass a generator function all over the place. The RNG of weight initialization is pretty significant, if it's allowed to be random then workflows which directly incorporate lora training instead of loading a trained file would be impossible to reproduce. It also seeds timestep sampling, which is the main factor driving training loss at small batch sizes.

With this change, fp32 training is now fully deterministic, although bf16 training is still partially nondeterministic, and I wasn't able to track down the cause of that. I'm guessing it could be related to stochastic rounding?

@spacepxl
Copy link
Contributor Author

spacepxl commented Jun 25, 2025

Some examples of the degree of run-to-run variance from bf16 training:

image

Each row is the same seed, just forced to rerun.

@comfyanonymous
Copy link
Owner

@KohakuBlueleaf what do you think?

@KohakuBlueleaf
Copy link
Contributor

@KohakuBlueleaf what do you think?

Not sure the UI part, but others are easy.

Will do them after I finish the refactor (which will affect seed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants