-
Notifications
You must be signed in to change notification settings - Fork 395
DLWP Indexing and memory consumption fix #859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
add WeightedOceanMSE to criterion
…aining - should improve coupled stability
Signed-off-by: root <[email protected]>
Update indices in constant coupler.
Gaussian Noise to Coupled Training
Signed-off-by: root <[email protected]>
Signed-off-by: root <[email protected]>
add 'Multi_SymmetricConvNeXtBlock'
Update workflow.
…ence_fix Fix the training and inference problem in nvidia modulus
Fix memory leak in coupled timeseries
Signed-off-by: root <[email protected]>
Signed-off-by: root <[email protected]>
Signed-off-by: root <[email protected]>
Rebase physicsnemo
/blossom-ci |
Don't have much to comment on with this. If it passes CI I would say merge unless there is something specific you want me to look at @daviddpruitt |
* Add CELU activation function (NVIDIA#851) * refactor: updating naming of a few files (modulus -> physicsnemo) (NVIDIA#850) Co-authored-by: Oliver Hennigh <[email protected]> * Various Corrdiff optimizations for drastic increase of training efficiency (NVIDIA#809) * mult-gpu training supported corrdiff optimization * enable mixed precision for val * clean codebase for opt * add amp_mode aware model architecture * add None checking for params * revise datatype casting schema * Add test cases for corrdiff optimizations Signed-off-by: Neal Pan <[email protected]> * revised from_checkpoint, update tests and CHANGELOG Signed-off-by: jialusui1102 <[email protected]> * Lint and format code properly Signed-off-by: Neal Pan <[email protected]> * add multi-gpu optimization * rebase changes and update tests and configs Signed-off-by: jialusui1102 <[email protected]> * merge ResidualLoss and refactored layer and Unet init based on PR review Signed-off-by: jialusui1102 <[email protected]> * Update layers.py with robust apex import * address incompatibility between dynamo and patching, retain same optimization perf w torch.compile Signed-off-by: jialusui1102 <[email protected]> * update tests Signed-off-by: jialusui1102 <[email protected]> * update changelog Signed-off-by: jialusui1102 <[email protected]> * initialize global_index directly on device Signed-off-by: jialusui1102 <[email protected]> * formatting Signed-off-by: jialusui1102 <[email protected]> --------- Signed-off-by: Neal Pan <[email protected]> Signed-off-by: jialusui1102 <[email protected]> Co-authored-by: Alicia Sui <[email protected]> Co-authored-by: jialusui1102 <[email protected]> Co-authored-by: Charlelie Laurent <[email protected]> * Catch improper use of patch gradient accumulation (NVIDIA#868) * Update train.py to catch improper use of path grad acc * Update train.py * Update train.py * Fixes compile of regression model in train.py * Removed unused imports Signed-off-by: Charlelie Laurent <[email protected]> * Changed grad patch accumulation logic Signed-off-by: Charlelie Laurent <[email protected]> --------- Signed-off-by: Charlelie Laurent <[email protected]> --------- Signed-off-by: Neal Pan <[email protected]> Signed-off-by: jialusui1102 <[email protected]> Signed-off-by: Charlelie Laurent <[email protected]> Co-authored-by: Yang-yang Tan <[email protected]> Co-authored-by: Carmelo Gonzales <[email protected]> Co-authored-by: Oliver Hennigh <[email protected]> Co-authored-by: nekobytz <[email protected]> Co-authored-by: Alicia Sui <[email protected]> Co-authored-by: jialusui1102 <[email protected]> Co-authored-by: Charlelie Laurent <[email protected]>
@daviddpruitt has this been covered by #879? |
…skip connection in symmetric ConvNeXt block
Add conditional loss for precip training and option to disable skip connection in Symmetric ConvNeXt block
Relevant changes included in #879 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PhysicsNeMo Pull Request
Description
Fixes indexing issues in couplers and alleviates issue with large CPU memory consumption in dataloader
Checklist
Dependencies