-
Notifications
You must be signed in to change notification settings - Fork 492
code refactor : making key steps modular train_step() #1650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: The refactor creates sperate functions for key steps within the train_step function to allow easy changes for future work. It creates 3 new functions: 1. gradient_computation : computes gradients via accumulation 2. run_optimizer_Step : runs all optimizer steps 3. compute_global_loss : calculates the global loss when logging The diff also creates a seperate function should_continue_training() to expand possible criterions for termination in the future without changing the train function. Rollback Plan: Differential Revision: D81185673
This pull request was exported from Phabricator. Differential Revision: D81185673 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shagun-G Can you provide more context to justify the PR? I currently cannot see n strong benefit to further create 3 sub functions. This makes the train logic less straitghforward.
@fegin Thank you for taking a look at the PR. The goal of this PR is to divide the main steps of training within the train_step function into further components. The goal is avoid code duplication while further developing over the training by allow the reuse of key components such as gradient computation and optimizer step while working to further or working of these components by just changing their corresponding functions instead of the entire train step. Examples include:
I am open to suggestions and making changes but the main goal was to make it easy to develop over torchtitan in terms of training algorithms by inheriting code instead of duplicating it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR and sharing interesting research work!
I agree that breaking trainer into smaller pieces can enable more flexible experiments. However, I think there is a tradeoff between flexibility and readability / succinctness. In torchtitan we do bias toward simplicity, especially when such changes would benefit only a group of people.
I do think for the frontier work you mentioned, it might be worth to have a separate train script, especially as your #1652 has landed. Note that even in torchtitan we host multiple train.py https://github.com/pytorch/torchtitan/blob/main/torchtitan/experiments/flux/train.py
torchtitan is evolving, and we might find such changes to be necessary in the future, but right now we won't accept this PR.
@tianyu-l Thank you for the comments. I agree on the tradeoff and will be happy to revisit this in the future. |
Summary:
The refactor creates sperate functions for key steps within the train_step function to allow easy changes for future work. It creates 3 new functions:
The diff also creates a seperate function should_continue_training() to expand possible criterions for termination in the future without changing the train function.
Rollback Plan:
Differential Revision: D81185673