LoRA training QoL improvements: UI progress bar, deterministic seeding, make gradient checkpointing optional #8668

spacepxl · 2025-06-25T18:50:52Z

Adding the UI progress bar allows users to see the training progress in the UI (obviously) but also makes it possible to cancel training.

Gradient checkpointing, especially with so many checkpoints, is computationally expensive and not necessary if memory isn't a constraint. I left it enabled by default but disabling it is a free speed boost:

got prompt
Added gradient checkpoints to 51 modules
Requested to load BaseModel
0 models unloaded.
loaded completely 9.5367431640625e+25 1656.400333404541 True
Training LoRA: 100%|████████████████████████████████████████████████████| 100/100 [00:50<00:00,  1.99it/s, loss=0.0382]
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:05<00:00,  5.99it/s]
Requested to load AutoencoderKL
loaded completely 21470.762142181396 159.55708122253418 True
Prompt executed in 57.28 seconds

got prompt
Requested to load BaseModel
0 models unloaded.
loaded completely 9.5367431640625e+25 1656.400333404541 True
Training LoRA: 100%|████████████████████████████████████████████████████| 100/100 [00:30<00:00,  3.27it/s, loss=0.0376]
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:05<00:00,  5.98it/s]
Requested to load AutoencoderKL
loaded completely 21470.762142181396 159.55708122253418 True
Prompt executed in 37.95 seconds

As for seeding, I replaced the unused generator and instead temporarily store the global RNG states, seed everything, then restore after training is finished. This seeds the weight initialization without needing to pass a generator function all over the place. The RNG of weight initialization is pretty significant, if it's allowed to be random then workflows which directly incorporate lora training instead of loading a trained file would be impossible to reproduce. It also seeds timestep sampling, which is the main factor driving training loss at small batch sizes.

With this change, fp32 training is now fully deterministic, although bf16 training is still partially nondeterministic, and I wasn't able to track down the cause of that. I'm guessing it could be related to stochastic rounding?

spacepxl · 2025-06-25T18:54:17Z

Some examples of the degree of run-to-run variance from bf16 training:

Each row is the same seed, just forced to rerun.

comfyanonymous · 2025-07-03T23:22:14Z

@KohakuBlueleaf what do you think?

KohakuBlueleaf · 2025-07-04T00:46:35Z

@KohakuBlueleaf what do you think?

Not sure the UI part, but others are easy.

Will do them after I finish the refactor (which will affect seed)

LoRA training QoL improvements

28fc7ab

spacepxl requested review from yoland68, robinjhuang, pythongosssss, ltdrdata, Kosinkadink, webfiltered and christian-byrne as code owners June 25, 2025 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LoRA training QoL improvements: UI progress bar, deterministic seeding, make gradient checkpointing optional #8668

LoRA training QoL improvements: UI progress bar, deterministic seeding, make gradient checkpointing optional #8668

spacepxl commented Jun 25, 2025

Uh oh!

spacepxl commented Jun 25, 2025 •

edited

Loading

Uh oh!

comfyanonymous commented Jul 3, 2025

Uh oh!

KohakuBlueleaf commented Jul 4, 2025

Uh oh!

Uh oh!

LoRA training QoL improvements: UI progress bar, deterministic seeding, make gradient checkpointing optional #8668

Are you sure you want to change the base?

LoRA training QoL improvements: UI progress bar, deterministic seeding, make gradient checkpointing optional #8668

Conversation

spacepxl commented Jun 25, 2025

Uh oh!

spacepxl commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

comfyanonymous commented Jul 3, 2025

Uh oh!

KohakuBlueleaf commented Jul 4, 2025

Uh oh!

Uh oh!

spacepxl commented Jun 25, 2025 •

edited

Loading