- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 611
Description
There are a few cases where I find myself wondering if we should make it more explicit how we can extend the train loop design to be more friendly for callbacks not having to cheat to get things like the loss and so on. Further, things like FluxTraining.jl also show that we have a certain lack of preexisting callbacks, which don't need to be rewritten.
So keeping this stuff in mind, I think using pullback
instead of gradient
would be a step towards that, as well as not optimising before a prehook to check for callback conditions etc. This should also fall in nicely how we want to set up schedulers. I would also want to figure out where distributed and multi gpu falls in this, so we know how to proceed.
We don't necessarily want to return the losses etc, but perhaps a slightly more trained model? This would fall in line with how Optimisers.jl is looking as well.
cc @lorenzoh
Activity
CarloLucibello commentedon Jan 12, 2021
Here I totally second @ToucheSir 's opinion expressed on Discourse:
We should deprecate
train!
entirely and let higher level packages such asFluxTraining.jl
handle thisDhairyaLGandhi commentedon Jan 12, 2021
The loop is meant to make training easier and we have seen it being useful in many cases. I am with you in wanting to do away with it, but also want to make sure the rest of the tooling is available, and actually make it more powerful without sacrificing on performance. I don't think there is an ideal loop, but one that can hold most cases, and have the tooling to extend and offer a guiding principle around best practices related to various bits of training is definitely worth it.
ToucheSir commentedon Jan 12, 2021
I don't mind having something like
train!
around, but perhaps the question is how many people are using it because that's all they need for their model, and how many are using it because it looks like what you should be using. For example, Keras hasfit
, sotrain!
must ==fit
and thus I should use onlytrain!
(with the implication that custom loops are scary and for advanced users only).I agree with this, but "hold most cases" is a slippery slope into something like https://github.com/lorenzoh/FluxTraining.jl/blob/master/src/train.jl. Again, there's the question of intent/scope. If Flux is fine with being an opinionated, multilevel framework, then something like
train!
makes perfect sense. If Flux is looking to be less opinionated on higher-level APIs or eschew them altogether, thentrain!
sticks out like a sore thumb.atiyo commentedon Jan 12, 2021
While there might not be a single ideal loop, I think it should be possible to have customisation that fits in nicely with
train!
while maintaining easy to use defaults.Frameworks like PyTorchLightning are quite high level, but allow for custom training loops, for example.
For something similar in Flux, we could introduce a new method for
train!
that accepts a function which describes a single training step.I have no strong feelings about above, but thought I would raise it since PyTorchLightning's abstractions to training loops seem to have attracted some fans. The fact that PyTorchLightning's abstractions are not native to Pytorch might indicate the value in having separate high and low level libraries.
ToucheSir commentedon Jan 12, 2021
That's what something like FluxTraining is doing already. You may want to have a look at the extensive back-and-forths we've had about training loop design on Zulip.
darsnack commentedon Jan 14, 2021
I agree with @CarloLucibello here. There are so many ways to design a training loop that it's going to be impossible to handle every case.
train!
really serves as a pedagogical example for how each piece of the training iteration comes together in Flux.FluxTraining itself relies on other packages for pieces of the full workflow. Trying to put even a simple portion of that into Flux seems like asking for maintenance overheads that we can't service. It also doesn't add much value to the Flux ecosystem. Julia users have no qualms about installing Flux + FluxTraining (or any other high level package). Multiple packages with the correct abstractions is how our ecosystem works.
lorenzoh commentedon Jan 15, 2021
I also think that this doesn't need to be something Flux.jl handles. I would rather have a full-fledged solution and I think that is out-of-scope for Flux.jl itself considering the complexity of FluxTraining.jl.
+1 this, composable packages are the way to go where possible.
CarloLucibello commentedon Jan 16, 2021
We could adopt a gentle deprecation path since FluxTraining.jl is not ready for debut (or it is?): remove
train!
from docs and model-zoo's examples for Flux v0.12, deprecate it in v0.13, and remove it laterDhairyaLGandhi commentedon Jan 16, 2021
I definitely think that adding the correct abstractions is an important bit. FluxTraining.jl is a very opinionated package in terms of training routines, so it's harder to justify it as a catch all. It's flexibility imo should come from making the callbacks consistent and available more easily to be used directly with the same kind of semantics as Flux. I feel there is benefit to having the train function in, because it's describing the semantics we expect, and is sufficient for most models, but we need to message it appropriately to suggest that it might be used in multiple ways, or that the for loop is a first class api that may be preferred for different packages and training routines and hit up examples showing it.
DhairyaLGandhi commentedon Jan 16, 2021
This might mean that we flesh out the docs or the function and point to more directly catered packages in the ecosystem. I don't see how that takes away from the composable nature of the Julia ecosystem, but formalizes how we have built the abstractions so far
lorenzoh commentedon Jan 16, 2021
Regarding FluxTraining.jl: if you want to do standard supervised learning it already works great and has a similar feature set to fastai's training loop (barring mixed-precision training and advanced data augmentation schemes).
It is also possible to write custom training loops for things like GAN training, though not always elegantly due to how the state is set up. So there is some more design work to be done to make it possible to support other training paradigms cleanly. I haven't yet since I am doing pretty basic supervised training in my current work; maybe someone who more actively works with other training paradigms like GANs and self-supervised learning can weigh in on what is missing FluxTraining.jl to support those use cases.
darsnack commentedon Jan 16, 2021
This is something that I completely agree with.
More complete examples with for loops would be a good idea.
train!
is not a for loop. It is a loop internally, but the interface exposed to users is a single function. This is why there is a need for callbacks at all. The user has no access to the for loop, so callbacks are necessary to allow the user an entry point.The implementation of a callback in simple for loop semantics is a function call. Doing anything more complicated would only make the for loop less simple.
More documentation for how users can write their own
train!
loops seems like the answer here instead of designing a callback system.lorenzoh commentedon Jan 16, 2021
Agreed,
train!
is only a small improvement API-wise but a large restriction in extensibility compared to writing a simple for-loop5 remaining items