NVIDIA · mnabian · Oct 30, 2025 · Oct 24, 2025 · Oct 24, 2025 · Oct 28, 2025
diff --git a/examples/structural_mechanics/crash/README.md b/examples/structural_mechanics/crash/README.md
@@ -1,5 +1,5 @@
 <!-- markdownlint-disable -->
-# Machine Learning Surrogates for Automotive Crash Dynamics
+# Machine Learning Surrogates for Automotive Crash Dynamics 🧱💥🚗💨
 
 ## Problem Overview
 
@@ -101,6 +101,110 @@ python inference.py
 Predicted meshes are written as .vtp files under
 ./predicted_vtps/, and can be opened using ParaView.
 
+## Datapipe: how inputs are constructed and normalized
+
+The datapipe is responsible for turning raw LS-DYNA/Abaqus or other crash runs into model-ready tensors and statistics. It does three things in a predictable, repeatable way: it reads and filters the raw data, it constructs inputs and targets with a stable interface, and it computes the statistics required to normalize both positions and features. This section explains what the datapipe returns, how to configure it, and what models should expect to receive at training and inference time.
+
+At a high level, each sample corresponds to one crash run. The datapipe loads the full deformation trajectory for that run, and emits exactly two items: inputs x and targets y. Inputs are a dictionary with two entries. The first entry, 'coords', is a [N, 3] tensor that contains the positions at the first timestep (t0) for all retained nodes. The second entry, 'features', is a [N, F] tensor that contains the concatenation of all node-wise features configured for this experiment. The order of columns in 'features' matches the order you provide in the configuration. This means if your configuration lists features as [thickness, Y_modulus], then column 0 will always be thickness and column 1 will always be Y_modulus. Targets y are the remaining positions from t1 to tT flattened along the feature dimension, so y has shape [N, (T-1)*3].
+
+Configuration lives under `conf/datapipe/`. There are two datapipe variants: one for graph-based models and one for point-cloud models. Both accept the same core options, and both expose a `features` list. The `features` list is the single source of truth for what goes into the 'features' tensor and in which order. If you do not want any features, set `features: []` and the datapipe will return an empty [N, 0] tensor for 'features' while keeping 'coords' intact. If you add more features later, the datapipe will preserve their order and update the per-dimension statistics automatically.
+
+Under the hood the datapipe reads node positions over time from LS-DYNA (via `d3plot_reader.py` or any compatible reader you configure). For each run it constructs a fixed number of time steps, selects and reindexes the active nodes, and optionally builds graph connectivity. It also computes statistics necessary for normalization. Position statistics include per-axis means and standard deviations, as well as normalized velocity and acceleration statistics used by autoregressive rollouts. Feature statistics are computed column-wise on the concatenated 'features' tensor. During dataset creation the datapipe normalizes the position trajectory using position means and standard deviations and normalizes every column of 'features' using feature means and standard deviations. The resulting tensors are numerically stable and consistent across training and evaluation. The statistics are written under `./stats/` as `node_stats.json` and `feature_stats.json` during training, and then read back in evaluation or inference.
+
+Readers are configurable through Hydra. A reader is any callable that returns `(srcs, dsts, point_data)`, where `point_data` is a list of records—one per run. Each record must include 'coords' as a [T, N, 3] array and one array per configured feature name. Arrays for features can be [N] or [N, K]; the datapipe will promote [N] to [N, 1] and then concatenate all feature arrays in the order declared in the configuration to form 'features'. If you are using graph-based models, the `srcs` and `dsts` arrays will be used to build a PyG `Data` object with symmetric edges and self-loops, and initial edge features are computed from positions at t0 (displacements and distances). If you are using point-cloud models, graph connectivity is ignored but the remainder of the pipeline is identical.
+
+Models should consume the two-part input without guessing column indices. Positions are always available in `x['coords']` and every node-wise feature is already concatenated in `x['features']`. If you need to separate features later—for example to log per-feature metrics—you can do so deterministically because the order of columns in `x['features']` exactly matches the `features` list in the configuration. For time-conditional models, you can pass the full `x['features']` to your functional input; for autoregressive models, you can concatenate `x['features']` to the normalized velocity (and time, if used) to form the model input at each rollout step.
+
+Finally, the datapipe is designed to be resilient to the “no features” case. If you set `features: []`, the 'features' tensor simply has width zero. Statistics are computed correctly (zero-length mean and unit standard deviation) and concatenations degrade gracefully to the original position-only behavior. This makes it easy to start simple and then scale up to richer feature sets without revisiting model-side code or the data normalization logic.
+
+For completeness, the datapipe also records a lightweight name-to-column map called `_feature_slices`. It associates each configured feature name with its [start, end) slice in `x['features']`. You typically won’t need it if you just consume the full `features` tensor, but it enables reliable, reproducible slicing by name for diagnostics or logging.
+
+## Reader: built-in d3plot and vtp readers and how to add your own
+
+The reader is the component that actually opens the raw simulation outputs and produces the arrays the datapipe consumes. It is intentionally thin and swappable via Hydra so you can adapt the pipeline to LS‑DYNA exports, Abaqus exports, or your own internal formats without touching the rest of the code.
+
+Built-in d3plot reader
+----------------------
+
+The default reader is implemented in `d3plot_reader.py`. It searches the data directory for subfolders that contain a `d3plot` file and treats each such folder as one “run.” For each run it opens the `d3plot` with `lasso.dyna.D3plot` and extracts node coordinates, time-varying displacements, element connectivity, and part identifiers. If a LS‑DYNA keyword (`.k`) file is present, it parses the shell section definitions to obtain per-part thickness values, then converts those into per-node thickness by averaging the values of incident elements. To avoid contaminating the training with rigid content, the reader classifies nodes as structural or wall based on a displacement variation threshold and drops wall nodes. After filtering, it builds a compact node index, remaps connectivity, and—if you are training a graph model—collects undirected edges from the remapped shell elements. It can optionally save one VTP file per time step to help you visually inspect the trajectories, or write the predictions to those files in inference.
+
+The reader then assembles the per-run record expected by the datapipe. Positions are returned under the key `'coords'` as a float array of shape `[T, N, 3]`, where T is the number of time steps and N is the number of retained nodes after filtering and remapping. Feature arrays are returned one per configured feature name; for example, if your datapipe configuration lists `features: [thickness, Y_modulus]`, the reader should provide a `'thickness'` array with shape `[N]` or `[N, 1]` and a `'Y_modulus'` array with shape `[N]` or `[N, K]`. The datapipe promotes 1D arrays to 2D and concatenates all provided feature arrays in the order given by the configuration to form the final `'features'` block supplied to the model.
+
+If you use the graph datapipe, the edge list is produced by walking the filtered shell elements and collecting unique boundary pairs, then symmetrized and augmented with self-loops inside the datapipe when constructing the PyG `Data` object. If you use the point‑cloud datapipe, the edge outputs are ignored but the rest of the record shape is the same, so you can swap between model families by changing configuration only.
+
+Built‑in VTP reader (PolyData)
+------------------------------
+
+In addition to `d3plot`, a lightweight VTP reader is provided in `vtp_reader.py`. It treats each `.vtp` file in a directory as a separate run and expects point displacements to be stored as vector arrays in `poly.point_data` with names like `displacement_t0.000`, `displacement_t0.005`, … (a more permissive fallback of any `displacement_t*` is also supported). The reader:
+
+- loads the reference coordinates from `poly.points`
+- builds absolute positions per timestep as `[t0: coords, t>0: coords + displacement_t]`
+- extracts cell connectivity from the PolyData faces and converts it to unique edges
+- returns `(srcs, dsts, point_data)` where `point_data` contains `'coords': [T, N, 3]`
+
+By default, the VTP reader does not attach additional features; it is compatible with `features: []`. If your `.vtp` files include additional per‑point arrays you would like to model (e.g., thickness or modulus), extend the reader to add those arrays to each run’s record using keys that match your `features` list. The datapipe will then concatenate them in the configured order.
+
+Example Hydra configuration for the VTP reader:
+
+```yaml
+# conf/reader/vtp.yaml
+_target_: vtp_reader.Reader
+```
+
+Select it in `conf/config.yaml`:
+
+```yaml
+defaults:
+  - datapipe: point_cloud
+  - model: transolver_time_conditional
+  - training: default
+  - inference: default
+  - reader: vtp
+```
+
+And set `features` to empty (or to the names you add in your extended reader) in `conf/datapipe/point_cloud.yaml` or `conf/datapipe/graph.yaml`:
+
+```yaml
+features: []  # or [thickness, Y_modulus] if your reader provides them
+```
+
+Write your own reader
+---------------------
+
+To write your own reader, implement a Hydra‑instantiable function or class whose call returns a three‑tuple `(srcs, dsts, point_data)`. The first two entries are lists of integer arrays describing edges per run (they can be empty lists if you are not producing a graph), and `point_data` is a list of Python dicts with one dict per run. Each dict must contain `'coords'` as a `[T, N, 3]` array and one array per feature name listed in `conf/datapipe/*.yaml` under `features`. Feature arrays can be `[N]` or `[N, K]` and should use the same node indexing as `'coords'`. For convenience, a simple class reader can accept the Hydra `split` argument (e.g., "train" or "test") and decide whether to save VTP frames, but this is optional.
+
+As a starting point, your YAML can point to a class by dotted path. For a class:
+
+```yaml
+# conf/reader/my_reader.yaml
+_target_: my_reader.MyReader
+# any constructor kwargs here, e.g. thresholds or unit conversions
+```
+
+Then, in `conf/config.yaml`, select the reader by adding or overriding `- reader: my_reader` (or `my_reader_fn`). The datapipe will call your reader with `data_dir`, `num_samples`, `split`, and an optional `logger`, and will expect the tuple described above. Provided you populate `'coords'` and the configured feature arrays per run, the rest of the pipeline—normalization, batching, graph construction, and model rollout—will work without code changes.
+
+A note on reader signatures and future‑proofing: the datapipe currently passes `data_dir`, `num_samples`, `split`, and `logger` when invoking the reader, and may pass additional keys in the future. To stay resilient, implement your reader with optional parameters and a catch‑all `**kwargs`.
+
+For a class reader, use this signature in `__call__`:
+
+```python
+class MyReader:
+    def __init__(self, some_option: float = 1.0):
+        self.some_option = some_option
+
+    def __call__(
+        self,
+        data_dir: str,
+        num_samples: int,
+        split: str | None = None,
+        logger=None,
+        **kwargs,
+    ):
+        ...
+```
+
+With this pattern, your reader will keep working even if the framework adds new optional arguments later.
+
 ## Postprocessing and Evaluation
 
 The postprocessing/ folder provides scripts for quantitative and qualitative evaluation:

diff --git a/examples/structural_mechanics/crash/conf/config.yaml b/examples/structural_mechanics/crash/conf/config.yaml
@@ -25,6 +25,7 @@ experiment_desc: "unified training recipe for crash models"
 run_desc: "unified training recipe for crash models"
 
 defaults:
+  - reader: vtp #d3plot
   - datapipe: point_cloud   # will be overridden by model configs
   - model: transolver_autoregressive_rollout_training
   - training: default

diff --git a/examples/structural_mechanics/crash/conf/datapipe/graph.yaml b/examples/structural_mechanics/crash/conf/datapipe/graph.yaml
@@ -20,4 +20,4 @@ name: crash_train
 split: train
 num_samples: ${training.num_training_samples}
 num_steps: ${training.num_time_steps}
-wall_node_disp_threshold: 1.0
+features: [thickness]
diff --git a/examples/structural_mechanics/crash/conf/datapipe/point_cloud.yaml b/examples/structural_mechanics/crash/conf/datapipe/point_cloud.yaml
@@ -18,4 +18,4 @@ _target_: datapipe.CrashPointCloudDataset
 data_dir: ${training.raw_data_dir}
 num_samples: ${training.num_training_samples}
 num_steps: ${training.num_time_steps}
-wall_node_disp_threshold: 1.0
+features: [thickness]
diff --git a/examples/structural_mechanics/crash/conf/reader/d3plot.yaml b/examples/structural_mechanics/crash/conf/reader/d3plot.yaml
@@ -0,0 +1,19 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2025 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+_target_: d3plot_reader.Reader
+wall_node_disp_threshold: 1.0
+
diff --git a/examples/structural_mechanics/crash/conf/reader/vtp.yaml b/examples/structural_mechanics/crash/conf/reader/vtp.yaml
@@ -0,0 +1,17 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2025 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+_target_: vtp_reader.Reader
diff --git a/examples/structural_mechanics/crash/d3plot_reader.py b/examples/structural_mechanics/crash/d3plot_reader.py
@@ -404,8 +404,34 @@ def process_d3plot_data(
             write_vtp,
             logger,
         )
-        point_data_all.append(
-            {"mesh_pos": mesh_pos_all, "thickness": filtered_thickness}
-        )
+        point_data_all.append({"coords": mesh_pos_all, "thickness": filtered_thickness})
 
     return srcs, dsts, point_data_all
+
+
+class Reader:
+    """
+    Reader for LS-DYNA d3plot files.
+
+    Args:
+        wall_node_disp_threshold: threshold for filtering wall nodes
+    """
+
+    def __init__(self, wall_node_disp_threshold: float = 1.0):
+        self.wall_node_disp_threshold = wall_node_disp_threshold
+
+    def __call__(
+        self,
+        data_dir: str,
+        num_samples: int,
+        split: str,
+        logger=None,
+    ):
+        write_vtp = False if split == "train" else True
+        return process_d3plot_data(
+            data_dir=data_dir,
+            num_samples=num_samples,
+            wall_node_disp_threshold=self.wall_node_disp_threshold,
+            write_vtp=write_vtp,
+            logger=logger,
+        )