Skip to content

Commit 7277097

Browse files
Add Zarr reader for Crash (#1228)
* Add Zarr reader for Crash * Update README * Update validation logic of point data in Zarr reader Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add a test for 2D feature arrays * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
1 parent 1a52284 commit 7277097

File tree

5 files changed

+736
-18
lines changed

5 files changed

+736
-18
lines changed

examples/structural_mechanics/crash/README.md

Lines changed: 104 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ For an in-depth comparison between the Transolver and MeshGraphNet models and th
3636
```yaml
3737
# conf/config.yaml
3838
defaults:
39-
- reader: vtp # or d3plot, or your custom reader
39+
- reader: vtp # vtp, zarr, d3plot, or your custom reader
4040
- datapipe: point_cloud # or graph
4141
- model: transolver_time_conditional # or an MGN variant
4242
- training: default
@@ -47,7 +47,7 @@ defaults:
4747
2) Point to your datasets and core training knobs.
4848
4949
- `conf/training/default.yaml`:
50-
- `raw_data_dir`: path to TRAIN runs (folder of run folders for d3plot, or folder of .vtp files for VTP)
50+
- `raw_data_dir`: path to TRAIN runs (folder of run folders for d3plot, folder of .vtp files for VTP, or folder of .zarr stores for Zarr)
5151
- `num_time_steps`: number of frames to use per run
5252
- `num_training_samples`: how many runs to load
5353

@@ -77,6 +77,7 @@ features: [thickness] # or [] for no features; preserve order if adding more
7777
4) Reader‑specific options (optional).
7878

7979
- d3plot: `conf/reader/d3plot.yaml` → `wall_node_disp_threshold`
80+
- VTP and Zarr readers have no additional options (they read pre-processed data)
8081

8182
5) Model config: ensure input dimensions match your features.
8283

@@ -127,26 +128,38 @@ This will install:
127128
[PhysicsNeMo-Curator](https://github.com/NVIDIA/physicsnemo-curator).
128129
Using `PhysicsNeMo-Curator`, crash simulation data from LS-DYNA can be processed into training-ready formats easily.
129130

130-
Currently, this can be used to preprocess d3plot files into VTP.
131+
PhysicsNeMo-Curator can preprocess d3plot files into **VTP** (for visualization and smaller datasets) or **Zarr** (for large-scale ML training).
131132

132133
### Quick Start
133134

134135
Install PhysicsNeMo-Curator following
135136
[these instructions](https://github.com/NVIDIA/physicsnemo-curator?tab=readme-ov-file#installation-and-usage).
136137

137-
Process your LS-DYNA data:
138+
Process your LS-DYNA data to **VTP format**:
138139

139140
```bash
140141
export PYTHONPATH=$PYTHONPATH:examples &&
141-
physicsnemo-curator-etl \
142-
--config-dir=examples/config \
143-
--config-name=crash_etl \
144-
etl.source.input_dir=/data/crash_sims/ \
145-
etl.sink.output_dir=/data/crash_processed_vtp/ \
142+
physicsnemo-curator-etl \
143+
--config-dir=examples/structural_mechanics/crash/config \
144+
--config-name=crash_etl \
145+
serialization_format=vtp \
146+
etl.source.input_dir=/data/crash_sims/ \
147+
serialization_format.sink.output_dir=/data/crash_vtp/ \
146148
etl.processing.num_processes=4
147149
```
148150

149-
This will process all LS-DYNA runs in `/data/crash_sims/` and output VTP files to `/data/crash_processed_vtp/`.
151+
Or process to **Zarr format** for large-scale training:
152+
153+
```bash
154+
export PYTHONPATH=$PYTHONPATH:examples &&
155+
physicsnemo-curator-etl \
156+
--config-dir=examples/structural_mechanics/crash/config \
157+
--config-name=crash_etl \
158+
serialization_format=zarr \
159+
etl.source.input_dir=/data/crash_sims/ \
160+
serialization_format.sink.output_dir=/data/crash_zarr/ \
161+
etl.processing.num_processes=4
162+
```
150163

151164
### Input Data Structure
152165

@@ -165,7 +178,7 @@ crash_sims/
165178

166179
### Output Formats
167180

168-
#### VTP Format (Recommended for this example)
181+
#### VTP Format
169182

170183
Produces single VTP file per run with all timesteps as displacement fields:
171184

@@ -179,10 +192,33 @@ crash_processed_vtp/
179192
Each VTP contains:
180193
- Reference coordinates at t=0
181194
- Displacement fields: `displacement_t0.000`, `displacement_t0.005`, etc.
182-
- Node thickness values
195+
- Node thickness and other point data features
183196

184197
This format is directly compatible with the VTP reader in this example.
185198

199+
#### Zarr Format
200+
201+
Produces one Zarr store per run with pre-computed graph structure:
202+
203+
```
204+
crash_processed_zarr/
205+
├── Run100.zarr/
206+
│ ├── mesh_pos # (timesteps, nodes, 3) - temporal positions
207+
│ ├── thickness # (nodes,) - node features
208+
│ └── edges # (num_edges, 2) - pre-computed graph connectivity
209+
├── Run101.zarr/
210+
└── ...
211+
```
212+
213+
Each Zarr store contains:
214+
- `mesh_pos`: Full temporal trajectory (no displacement reconstruction needed)
215+
- `thickness`: Per-node features
216+
- `edges`: Pre-computed edge connectivity (no edge rebuilding during training)
217+
218+
**NOTE:** All heavy preprocessing (node filtering, edge building, thickness computation) is done once during curation using PhysicsNeMo-Curator. The reader simply loads pre-computed arrays.
219+
220+
This format is directly compatible with the Zarr reader in this example.
221+
186222
## Training
187223

188224
Training is managed via Hydra configurations located in conf/.
@@ -277,14 +313,15 @@ If you use the graph datapipe, the edge list is produced by walking the filtered
277313

278314
### Built‑in VTP reader (PolyData)
279315

280-
In addition to `d3plot`, a lightweight VTP reader is provided in `vtp_reader.py`. It treats each `.vtp` file in a directory as a separate run and expects point displacements to be stored as vector arrays in `poly.point_data` with names like `displacement_t0.000`, `displacement_t0.005`, … (a more permissive fallback of any `displacement_t*` is also supported). The reader:
316+
A lightweight VTP reader is provided in `vtp_reader.py`. It treats each `.vtp` file in a directory as a separate run and expects point displacements to be stored as vector arrays in `poly.point_data` with names like `displacement_t0.000`, `displacement_t0.005`, … (a more permissive fallback of any `displacement_t*` is also supported). The reader:
281317

282318
- loads the reference coordinates from `poly.points`
283319
- builds absolute positions per timestep as `[t0: coords, t>0: coords + displacement_t]`
284320
- extracts cell connectivity from the PolyData faces and converts it to unique edges
285-
- returns `(srcs, dsts, point_data)` where `point_data` contains `'coords': [T, N, 3]`
321+
- extracts all point data fields dynamically (e.g., thickness, modulus)
322+
- returns `(srcs, dsts, point_data)` where `point_data` contains `'coords': [T, N, 3]` and all feature arrays
286323

287-
By default, the VTP reader does not attach additional features; it is compatible with `features: []`. If your `.vtp` files include additional per‑point arrays you would like to model (e.g., thickness or modulus), extend the reader to add those arrays to each run’s record using keys that match your `features` list. The datapipe will then concatenate them in the configured order.
324+
The VTP reader dynamically extracts all non-displacement point data fields from the VTP file and makes them available to the datapipe. If your `.vtp` files include additional per‑point arrays (e.g., thickness or modulus), simply add their names to the `features` list in your datapipe config.
288325

289326
Example Hydra configuration for the VTP reader:
290327

@@ -304,12 +341,58 @@ defaults:
304341
- reader: vtp
305342
```
306343

307-
And set `features` to empty (or to the names you add in your extended reader) in `conf/datapipe/point_cloud.yaml` or `conf/datapipe/graph.yaml`:
344+
And configure features in `conf/datapipe/point_cloud.yaml` or `conf/datapipe/graph.yaml`:
308345

309346
```yaml
310-
features: [] # or [thickness, Y_modulus] if your reader provides them
347+
features: [thickness] # or [] for no features
311348
```
312349

350+
### Built‑in Zarr reader
351+
352+
A Zarr reader provided in `zarr_reader.py`. It reads pre-processed Zarr stores created by PhysicsNeMo-Curator, where all heavy computation (node filtering, edge building, thickness computation) has already been done during the ETL pipeline. The reader:
353+
354+
- loads pre-computed temporal positions directly from `mesh_pos` (no displacement reconstruction)
355+
- loads pre-computed edges (no connectivity-to-edge conversion needed)
356+
- dynamically extracts all point data fields (thickness, etc.) from the Zarr store
357+
- returns `(srcs, dsts, point_data)` similar to VTP reader
358+
359+
Data layout expected by Zarr reader:
360+
- `<DATA_DIR>/*.zarr/` (each `.zarr` directory is treated as one run)
361+
- Each Zarr store must contain:
362+
- `mesh_pos`: `[T, N, 3]` temporal positions
363+
- `edges`: `[E, 2]` pre-computed edge connectivity
364+
- Feature arrays (e.g., `thickness`): `[N]` or `[N, K]` per-node features
365+
366+
Example Hydra configuration for the Zarr reader:
367+
368+
```yaml
369+
# conf/reader/zarr.yaml
370+
_target_: zarr_reader.Reader
371+
```
372+
373+
Select it in `conf/config.yaml`:
374+
375+
```yaml
376+
defaults:
377+
- reader: zarr # Options are: vtp, d3plot, zarr
378+
- datapipe: point_cloud # will be overridden by model configs
379+
- model: transolver_autoregressive_rollout_training
380+
- training: default
381+
- inference: default
382+
- _self_
383+
```
384+
385+
And configure features in `conf/datapipe/graph.yaml`:
386+
387+
```yaml
388+
features: [thickness] # Must match fields stored in Zarr
389+
```
390+
391+
**Recommended workflow:**
392+
1. Use PhysicsNeMo-Curator to preprocess d3plot → VTP or Zarr once
393+
2. Use corresponding reader for all training/validation
394+
3. Optionally use d3plot reader for quick prototyping on raw data
395+
313396
### Data layout expected by readers
314397

315398
- d3plot reader (`d3plot_reader.py`):
@@ -320,6 +403,10 @@ features: [] # or [thickness, Y_modulus] if your reader provides them
320403
- `<DATA_DIR>/*.vtp` (each `.vtp` is treated as one run)
321404
- Displacements stored as 3‑component arrays in point_data with names like `displacement_t0.000`, `displacement_t0.005`, ... (fallback accepts any `displacement_t*`).
322405

406+
- Zarr reader (`zarr_reader.py`):
407+
- `<DATA_DIR>/*.zarr/` (each `.zarr` directory is treated as one run)
408+
- Contains pre-computed `mesh_pos`, `edges`, and feature arrays
409+
323410
### Write your own reader
324411

325412
To write your own reader, implement a Hydra‑instantiable function or class whose call returns a three‑tuple `(srcs, dsts, point_data)`. The first two entries are lists of integer arrays describing edges per run (they can be empty lists if you are not producing a graph), and `point_data` is a list of Python dicts with one dict per run. Each dict must contain `'coords'` as a `[T, N, 3]` array and one array per feature name listed in `conf/datapipe/*.yaml` under `features`. Feature arrays can be `[N]` or `[N, K]` and should use the same node indexing as `'coords'`. For convenience, a simple class reader can accept the Hydra `split` argument (e.g., "train" or "test") and decide whether to save VTP frames, but this is optional.

examples/structural_mechanics/crash/conf/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ experiment_desc: "unified training recipe for crash models"
2525
run_desc: "unified training recipe for crash models"
2626

2727
defaults:
28-
- reader: vtp #d3plot
28+
- reader: vtp # Options are: vtp, d3plot, zarr
2929
- datapipe: point_cloud # will be overridden by model configs
3030
- model: transolver_autoregressive_rollout_training
3131
- training: default
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2025 NVIDIA CORPORATION & AFFILIATES.
2+
# SPDX-FileCopyrightText: All rights reserved.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
_target_: zarr_reader.Reader
18+
_convert_: all

0 commit comments

Comments
 (0)