Skip to content

Commit 611a029

Browse files
coreyjadamsmnabiandreamtalenram-cherukuridakhare-creator
authored
Refactor (#1234)
* Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Update crash readme (#1212) * update license headers- second try * update readme * Bump multi-storage-client to v0.33.0 with rust client (#1156) * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Add jaxtyping to requirements.txt for crash sample (#1218) * update license headers- second try * Update requirements.txt * Updating to address some test issues * Replace 'License' link with 'Dev blog' link (#1215) Co-authored-by: Corey adams <[email protected]> * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Validation fu added to examples/structural_mechanics/crash/train.py (#1204) * validation added: works for multi-node job. * rename and rearrange validation function * validate_every_n_epochs, save_ckpt_every_n_epochs added in config * corrected bug (args of model) in inference * args in validation code updated * val path added and args name changed * validation split added -> write_vtp=False * fixed inference bug * bug fix: write_vtp * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Add saikrishnanc-nv to github actors (#1225) * Integrate Curator instructions to the Crash example (#1213) * Integrate Curator instructions * Update docs * Formatting changes * Adding code of conduct (#1214) * Adding code of conduct Adopting the code of conduct from the https://www.contributor-covenant.org/ * Update CODE_OF_CONDUCT.MD Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Create .markdownlintignore * Revise README for PhysicsNeMo resources and guidance Updated the 'Getting Started' section and added new resources for learning AI Physics. * Update README.md --------- Co-authored-by: Mohammad Amin Nabian <[email protected]> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Corey adams <[email protected]> * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Fixed minor bug in shape validation in SongUNet (#1230) Signed-off-by: Charlelie Laurent <[email protected]> * Add Zarr reader for Crash (#1228) * Add Zarr reader for Crash * Update README * Update validation logic of point data in Zarr reader Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add a test for 2D feature arrays * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Add AR RT and OT schemes to Crash FIGConvNet (#1232) * Add AR and OT schemes for FIGConvNet * Add tests * Soothe the linter * Fix the tests * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix --------- Signed-off-by: Charlelie Laurent <[email protected]> Co-authored-by: Mohammad Amin Nabian <[email protected]> Co-authored-by: Yongming Ding <[email protected]> Co-authored-by: ram-cherukuri <[email protected]> Co-authored-by: Deepak Akhare <[email protected]> Co-authored-by: Sai Krishnan Chandrasekar <[email protected]> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Charlelie Laurent <[email protected]> Co-authored-by: Alexey Kamenev <[email protected]>
1 parent 3cb9a02 commit 611a029

File tree

72 files changed

+905
-554
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+905
-554
lines changed

.importlinter

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,3 +98,9 @@ name = Prevent Non-listed external imports in physicsnemo nn
9898
type = forbidden_import
9999
container = physicsnemo.nn
100100
dependency_group = nn
101+
102+
[importlinter:contract:physicsnemo-models-external-imports]
103+
name = Prevent Non-listed external imports in physicsnemo models
104+
type = forbidden_import
105+
container = physicsnemo.models
106+
dependency_group = models

.pre-commit-config.yaml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,9 @@ repos:
5454
- id: check-added-large-files
5555
args: [--maxkb=5000]
5656

57-
- repo: https://github.com/seddonym/import-linter
58-
rev: v2.5.2
59-
hooks:
60-
- id: import-linter
57+
# This should be enabled once all dependencies are cleared.
58+
# For now, check status with `lint-imports`
59+
# - repo: https://github.com/seddonym/import-linter
60+
# rev: v2.5.2
61+
# hooks:
62+
# - id: import-linter

examples/structural_mechanics/crash/README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,14 @@ For an in-depth comparison between the Transolver and MeshGraphNet models and th
1919

2020
<p align="center">
2121
<img src="../../../docs/img/crash/crash_case4_reduced.gif" alt="Crash animation" width="80%" />
22-
22+
2323
</p>
2424

2525
### Crushcan Modeling
2626

2727
<p align="center">
2828
<img src="../../../docs/img/crash/crushcan.gif" alt="Crushcan animation" width="80%" />
29-
29+
3030
</p>
3131

3232
## Quickstart
@@ -238,7 +238,10 @@ conf/
238238
│ ├── mgn_time_conditional.yaml
239239
│ ├── transolver_autoregressive_rollout_training.yaml
240240
│ ├── transolver_one_step_rollout.yaml
241-
│ └── transolver_time_conditional.yaml
241+
│ ├── transolver_time_conditional.yaml
242+
│ ├── figconvunet_autoregressive_rollout_training.yaml
243+
│ ├── figconvunet_one_step_rollout.yaml
244+
│ └── figconvunet_time_conditional.yaml
242245
├── training/default.yaml # training hyperparameters
243246
└── inference/default.yaml # inference options
244247
```
@@ -495,7 +498,7 @@ run_post_processing.sh can automate all evaluation tasks across runs.
495498

496499
- AMP is enabled by default in training; it reduces memory and accelerates matmuls on modern GPUs.
497500
- For multi-GPU training, use `torchrun --standalone --nproc_per_node=<NUM_GPUS> train.py`.
498-
- For DDP, prefer `torchrun --standalone --nproc_per_node=<NUM_GPUS> train.py`.
501+
- For DDP, prefer `torchrun --standalone --nproc_per_node=<NUM_GPUS> train.py`.
499502

500503
## Troubleshooting / FAQ
501504

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2025 NVIDIA CORPORATION & AFFILIATES.
2+
# SPDX-FileCopyrightText: All rights reserved.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
_target_: rollout.FIGConvUNetAutoregressiveRolloutTraining
18+
_convert_: all
19+
20+
# Input/output channels
21+
in_channels: 5 # velocity(3) + features(F) + time(1)
22+
out_channels: 3 # acceleration (xyz)
23+
24+
# Architecture
25+
kernel_size: 3
26+
hidden_channels: [16, 16, 16] # channels at each level
27+
num_levels: 2 # number of down/up levels
28+
num_down_blocks: 1
29+
num_up_blocks: 1
30+
mlp_channels: [256, 256]
31+
32+
# Spatial domain
33+
aabb_max: [2.0, 2.0, 2.0]
34+
aabb_min: [-2.0, -2.0, -2.0]
35+
voxel_size: null
36+
37+
# Grid resolutions (factorized implicit grids)
38+
# Format: [memory_format, resolution_tuple]
39+
resolution_memory_format_pairs:
40+
- [b_xc_y_z, [2, 64, 64]]
41+
- [b_yc_x_z, [64, 2, 64]]
42+
- [b_zc_x_y, [64, 64, 2]]
43+
44+
# Position encoding
45+
use_rel_pos: true
46+
use_rel_pos_embed: true
47+
pos_encode_dim: 16
48+
49+
# Communication and sampling
50+
communication_types: ["sum"]
51+
to_point_sample_method: "graphconv"
52+
neighbor_search_type: "knn"
53+
knn_k: 16
54+
reductions: ["mean"]
55+
56+
use_scalar_output: false
57+
has_input_features: true
58+
59+
# Pooling (for global features if needed)
60+
pooling_type: "max"
61+
pooling_layers: [2]
62+
63+
# Rollout parameters
64+
num_time_steps: ${training.num_time_steps}
65+
dt: 5e-3
66+
initial_vel: 9.22
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2025 NVIDIA CORPORATION & AFFILIATES.
2+
# SPDX-FileCopyrightText: All rights reserved.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
_target_: rollout.FIGConvUNetOneStepRollout
18+
_convert_: all
19+
20+
# Input/output channels
21+
in_channels: 4 # velocity(3) + features (F)
22+
out_channels: 3 # next step position (xyz)
23+
24+
# Architecture
25+
kernel_size: 3
26+
hidden_channels: [16, 16, 16] # channels at each level
27+
num_levels: 2 # number of down/up levels
28+
num_down_blocks: 1
29+
num_up_blocks: 1
30+
mlp_channels: [256, 256]
31+
32+
# Spatial domain
33+
aabb_max: [2.0, 2.0, 2.0]
34+
aabb_min: [-2.0, -2.0, -2.0]
35+
voxel_size: null
36+
37+
# Grid resolutions (factorized implicit grids)
38+
# Format: Uses res_mem_pair resolver (memory_format_enum, resolution_tuple)
39+
resolution_memory_format_pairs:
40+
- [b_xc_y_z, [2, 64, 64]]
41+
- [b_yc_x_z, [64, 2, 64]]
42+
- [b_zc_x_y, [64, 64, 2]]
43+
44+
# Position encoding
45+
use_rel_pos: true
46+
use_rel_pos_embed: true
47+
pos_encode_dim: 16
48+
49+
# Communication and sampling
50+
communication_types: ["sum"]
51+
to_point_sample_method: "graphconv"
52+
neighbor_search_type: "knn"
53+
knn_k: 16
54+
reductions: ["mean"]
55+
56+
use_scalar_output: false
57+
has_input_features: true
58+
59+
# Pooling (for global features if needed)
60+
pooling_type: "max"
61+
pooling_layers: [2]
62+
63+
# Rollout parameters
64+
num_time_steps: ${training.num_time_steps}
65+
dt: 5e-3
66+
initial_vel: 9.22

examples/structural_mechanics/crash/conf/model/figconvunet_time_conditional.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ neighbor_search_type: "knn"
5353
knn_k: 16
5454
reductions: ["mean"]
5555

56+
use_scalar_output: false
57+
has_input_features: true
58+
5659
# Pooling (for global features if needed)
5760
pooling_type: "max"
5861
pooling_layers: [2]

examples/structural_mechanics/crash/rollout.py

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -467,3 +467,171 @@ def step_fn(verts, feats):
467467
outputs.append(y_t)
468468

469469
return torch.stack(outputs, dim=0) # [T, N, 3]
470+
471+
472+
class FIGConvUNetOneStepRollout(FIGConvUNet):
473+
"""
474+
FIGConvUNet with one-step rollout for crash simulation.
475+
476+
- Training: teacher forcing (uses GT positions at each step)
477+
- Inference: autoregressive (uses predictions)
478+
"""
479+
480+
def __init__(self, *args, **kwargs):
481+
self.dt: float = kwargs.pop("dt", 5e-3)
482+
self.initial_vel: torch.Tensor = kwargs.pop("initial_vel")
483+
self.rollout_steps: int = kwargs.pop("num_time_steps") - 1
484+
super().__init__(*args, **kwargs)
485+
486+
def forward(self, sample: SimSample, data_stats: dict) -> torch.Tensor:
487+
"""
488+
Args:
489+
Sample: SimSample containing node_features and node_target
490+
data_stats: dict containing normalization stats
491+
Returns:
492+
[T, N, 3] rollout of predicted positions
493+
"""
494+
inputs = sample.node_features
495+
x0 = inputs["coords"] # initial pos [N, 3]
496+
features = inputs.get("features", x0.new_zeros((x0.size(0), 0))) # [N, F]
497+
498+
# Ground truth sequence [T, N, 3]
499+
N = x0.size(0)
500+
gt_seq = torch.cat(
501+
[x0.unsqueeze(0), sample.node_target.view(N, -1, 3).transpose(0, 1)],
502+
dim=0,
503+
)
504+
505+
outputs: list[torch.Tensor] = []
506+
# First step: backstep to create y_-1
507+
y_t0 = gt_seq[0] - self.initial_vel * self.dt
508+
y_t1 = gt_seq[0]
509+
510+
for t in range(self.rollout_steps):
511+
# In training mode (except first step), use ground truth positions
512+
if self.training and t > 0:
513+
y_t0, y_t1 = gt_seq[t - 1], gt_seq[t]
514+
515+
# Prepare vertices for FIGConvUNet: [1, N, 3]
516+
vertices = y_t1.unsqueeze(0) # [1, N, 3]
517+
518+
vel = (y_t1 - y_t0) / self.dt
519+
vel_norm = (vel - data_stats["node"]["norm_vel_mean"]) / (
520+
data_stats["node"]["norm_vel_std"] + EPS
521+
)
522+
523+
# [1, N, 3 + F]
524+
fx_t = torch.cat([vel_norm, features], dim=-1).unsqueeze(0)
525+
526+
def step_fn(verts, feats):
527+
out, _ = super(FIGConvUNetOneStepRollout, self).forward(
528+
vertices=verts, features=feats
529+
)
530+
return out
531+
532+
if self.training:
533+
outf = ckpt(
534+
step_fn,
535+
vertices,
536+
fx_t,
537+
use_reentrant=False,
538+
).squeeze(0) # [N, 3]
539+
else:
540+
outf = step_fn(vertices, fx_t).squeeze(0) # [N, 3]
541+
542+
acc = (
543+
outf * data_stats["node"]["norm_acc_std"]
544+
+ data_stats["node"]["norm_acc_mean"]
545+
)
546+
vel_pred = self.dt * acc + vel
547+
y_t2_pred = self.dt * vel_pred + y_t1
548+
549+
outputs.append(y_t2_pred)
550+
551+
if not self.training:
552+
# autoregressive update for inference
553+
y_t0, y_t1 = y_t1, y_t2_pred
554+
555+
return torch.stack(outputs, dim=0) # [T, N, 3]
556+
557+
558+
class FIGConvUNetAutoregressiveRolloutTraining(FIGConvUNet):
559+
"""
560+
FIGConvUNet with autoregressive rollout training for crash simulation.
561+
562+
Predicts sequence by autoregressively updating velocity and position
563+
using predicted accelerations. Supports gradient checkpointing during training.
564+
"""
565+
566+
def __init__(self, *args, **kwargs):
567+
self.dt: float = kwargs.pop("dt")
568+
self.initial_vel: torch.Tensor = kwargs.pop("initial_vel")
569+
self.rollout_steps: int = kwargs.pop("num_time_steps") - 1
570+
super().__init__(*args, **kwargs)
571+
572+
def forward(self, sample: SimSample, data_stats: dict) -> torch.Tensor:
573+
"""
574+
Args:
575+
sample: SimSample containing node_features and node_target
576+
data_stats: dict containing normalization stats
577+
Returns:
578+
[T, N, 3] rollout of predicted positions
579+
"""
580+
inputs = sample.node_features
581+
coords = inputs["coords"] # [N, 3]
582+
features = inputs.get("features", coords.new_zeros((coords.size(0), 0)))
583+
N = coords.size(0)
584+
device = coords.device
585+
586+
# Initial states
587+
y_t1 = coords # [N, 3]
588+
y_t0 = y_t1 - self.initial_vel * self.dt # backstep using initial velocity
589+
590+
outputs: list[torch.Tensor] = []
591+
for t in range(self.rollout_steps):
592+
time_t = 0.0 if self.rollout_steps <= 1 else t / (self.rollout_steps - 1)
593+
time_t = torch.tensor([time_t], device=device, dtype=torch.float32)
594+
595+
# Velocity normalization
596+
vel = (y_t1 - y_t0) / self.dt
597+
vel_norm = (vel - data_stats["node"]["norm_vel_mean"]) / (
598+
data_stats["node"]["norm_vel_std"] + EPS
599+
)
600+
601+
# Prepare vertices for FIGConvUNet: [1, N, 3]
602+
vertices = y_t1.unsqueeze(0) # [1, N, 3]
603+
604+
# Prepare features: vel_norm + features + time [N, 3+F+1]
605+
fx_t = torch.cat(
606+
[vel_norm, features, time_t.expand(N, 1)], dim=-1
607+
) # [N, 3+F+1]
608+
fx_t = fx_t.unsqueeze(0) # [1, N, 3+F+1]
609+
610+
def step_fn(verts, feats):
611+
out, _ = super(FIGConvUNetAutoregressiveRolloutTraining, self).forward(
612+
vertices=verts, features=feats
613+
)
614+
return out
615+
616+
if self.training:
617+
outf = ckpt(
618+
step_fn,
619+
vertices,
620+
fx_t,
621+
use_reentrant=False,
622+
).squeeze(0) # [N, 3]
623+
else:
624+
outf = step_fn(vertices, fx_t).squeeze(0) # [N, 3]
625+
626+
# De-normalize acceleration
627+
acc = (
628+
outf * data_stats["node"]["norm_acc_std"]
629+
+ data_stats["node"]["norm_acc_mean"]
630+
)
631+
vel = self.dt * acc + vel
632+
y_t2 = self.dt * vel + y_t1
633+
634+
outputs.append(y_t2)
635+
y_t1, y_t0 = y_t2, y_t1
636+
637+
return torch.stack(outputs, dim=0) # [T, N, 3]

0 commit comments

Comments
 (0)