Stack 2/2: Adding early dropout to drop path feature using linear decay.#642
Stack 2/2: Adding early dropout to drop path feature using linear decay.#642alxmrs wants to merge 7 commits into
Conversation
93e5c02 to
50005bc
Compare
|
@codex May I have a review? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5f89d148f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
d5f89d1 to
4f417ea
Compare
|
@codex may I have another review from you? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74a4b659f0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| # Early stochastic depth: decay drop path rate over training. | ||
| if isinstance(self.model, BaseModel): | ||
| self.model.set_epoch(epoch - 1) |
There was a problem hiding this comment.
Align drop-path decay with configured epoch count
Calling set_epoch(epoch - 1) shifts the early stochastic-depth schedule by one epoch, so with drop_path_epochs = N the drop-path rate does not reach zero until epoch N+1 (e.g., at epoch 10 it is still nonzero when configured for 10). This conflicts with the new config/docs that say the rate decays to zero over the specified number of epochs and can skew experiment reproducibility when users tune drop_path_epochs expecting an exact cutoff.
Useful? React with 👍 / 👎.
8b3db0d to
b0628f1
Compare
b0628f1 to
a5d3264
Compare
|
@codex may I have your review? |
|
@fomo-bot can you rebase this on the latest main given the changes that happened on the (previous) base PR please? |
…r decay. # Conflicts: # src/ocean_emulators/train.py
4f0b75b to
07a0cea
Compare
|
Rebased Codex work
Codex transcript |
This paper1 and corresponding repo2 suggest that dropout is even more effective when it is only used early during training. The paper reports that early dropout in ConvNext UNets works better than both using (s.d.) dropout or not using dropout at all:
Thus, this PR adds an enhancement to #641 that lets users define a number of epochs to linearly decay the dropout rate until it hits zero. This provides a really simple means of specifying early dropout in our experiments. According to 2, this provides just as much benefit as more complex schemes of implementing early drop out while minimizing the number of things we need to configure.