Skip to content

Conversation

@dakhare-creator
Copy link
Contributor

PhysicsNeMo Pull Request

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

Dakhare crash

- Validation function to track model performance on test dataset is added in physicsnemo/examples/structural_mechanics/crash/train.py

- validate_every_n_epochs, save_ckpt_every_n_epochs added in config/training/default.yaml to assign frequency for calling validation function and saking checkpoint
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds validation functionality to the structural mechanics crash simulation training example. The main changes include: (1) adding validation dataset creation and distributed sampling in train.py, (2) implementing a validation loop that computes time-step-wise MSE loss and aggregates results across distributed ranks, (3) adding validation configuration parameters to control validation frequency and checkpoint saving, and (4) refactoring the inference code to use a unified sample object interface instead of passing individual graph components separately.

The validation implementation follows distributed training best practices by properly handling data sampling, metric aggregation, and logging only on rank 0. The changes integrate cleanly with the existing training pipeline and tensorboard logging infrastructure, providing essential model monitoring capabilities for the crash simulation example.

PR Description Notes:

  • The PR description is largely empty with only unchecked checklist items
  • No standalone description of changes provided
  • No linked issues or changelog updates mentioned
  • Missing information about new dependencies or testing coverage

Important Files Changed

Filename Score Overview
examples/structural_mechanics/crash/train.py 4/5 Added comprehensive validation functionality with distributed sampling, MSE computation, and tensorboard logging
examples/structural_mechanics/crash/conf/training/default.yaml 5/5 Added validation configuration parameters for sample count, validation frequency, and checkpoint saving
examples/structural_mechanics/crash/inference.py 4/5 Refactored model forward pass to use unified sample object interface instead of separate graph components

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@mnabian
Copy link
Collaborator

mnabian commented Nov 3, 2025

/blossom-ci

@mnabian
Copy link
Collaborator

mnabian commented Nov 10, 2025

/blossom-ci

@mnabian mnabian self-requested a review November 10, 2025 22:16
Copy link
Collaborator

@mnabian mnabian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing the comments!

@mnabian mnabian merged commit f8fd198 into NVIDIA:main Nov 10, 2025
1 check passed
coreyjadams added a commit that referenced this pull request Nov 14, 2025
* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* Replace 'License' link with 'Dev blog' link (#1215)

Co-authored-by: Corey adams <[email protected]>

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Validation fu added to examples/structural_mechanics/crash/train.py (#1204)

* validation added: works for multi-node job.

* rename and rearrange validation function

* validate_every_n_epochs, save_ckpt_every_n_epochs added in config

* corrected bug (args of model) in inference

* args in validation code updated

* val path added and args name changed

* validation split added -> write_vtp=False

* fixed inference bug

* bug fix: write_vtp

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Add saikrishnanc-nv to github actors (#1225)

* Integrate Curator instructions to the Crash example (#1213)

* Integrate Curator instructions

* Update docs

* Formatting changes

* Adding code of conduct (#1214)

* Adding code of conduct

Adopting the code of conduct from the https://www.contributor-covenant.org/

* Update CODE_OF_CONDUCT.MD

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Create .markdownlintignore

* Revise README for PhysicsNeMo resources and guidance

Updated the 'Getting Started' section and added new resources for learning AI Physics.

* Update README.md

---------

Co-authored-by: Mohammad Amin Nabian <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Corey adams <[email protected]>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

---------

Co-authored-by: Mohammad Amin Nabian <[email protected]>
Co-authored-by: Yongming Ding <[email protected]>
Co-authored-by: ram-cherukuri <[email protected]>
Co-authored-by: Deepak Akhare <[email protected]>
Co-authored-by: Sai Krishnan Chandrasekar <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
coreyjadams added a commit that referenced this pull request Nov 18, 2025
* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* Replace 'License' link with 'Dev blog' link (#1215)

Co-authored-by: Corey adams <[email protected]>

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Validation fu added to examples/structural_mechanics/crash/train.py (#1204)

* validation added: works for multi-node job.

* rename and rearrange validation function

* validate_every_n_epochs, save_ckpt_every_n_epochs added in config

* corrected bug (args of model) in inference

* args in validation code updated

* val path added and args name changed

* validation split added -> write_vtp=False

* fixed inference bug

* bug fix: write_vtp

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Add saikrishnanc-nv to github actors (#1225)

* Integrate Curator instructions to the Crash example (#1213)

* Integrate Curator instructions

* Update docs

* Formatting changes

* Adding code of conduct (#1214)

* Adding code of conduct

Adopting the code of conduct from the https://www.contributor-covenant.org/

* Update CODE_OF_CONDUCT.MD

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Create .markdownlintignore

* Revise README for PhysicsNeMo resources and guidance

Updated the 'Getting Started' section and added new resources for learning AI Physics.

* Update README.md

---------

Co-authored-by: Mohammad Amin Nabian <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Corey adams <[email protected]>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Fixed minor bug in shape validation in SongUNet (#1230)

Signed-off-by: Charlelie Laurent <[email protected]>

* Add Zarr reader for Crash (#1228)

* Add Zarr reader for Crash

* Update README

* Update validation logic of point data in Zarr reader

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add a test for 2D feature arrays

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

---------

Signed-off-by: Charlelie Laurent <[email protected]>
Co-authored-by: Mohammad Amin Nabian <[email protected]>
Co-authored-by: Yongming Ding <[email protected]>
Co-authored-by: ram-cherukuri <[email protected]>
Co-authored-by: Deepak Akhare <[email protected]>
Co-authored-by: Sai Krishnan Chandrasekar <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Charlelie Laurent <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants