Skip to content

Commit cf342fd

Browse files
Domino finetune (#1082)
* Spelling + Grammar Fixes (#1050) * Adds fixes * Fixes extra line * first commit for domino-nim finetuning * finetuning recipe refactored and benchmarked * Update README.md * Update README.md * Update README.md * Update README.md with changes * merge conflicts * Update examples_cfd.rst * Update CHANGELOG.md * fixing CI issues * fixing CI issues in Readme * adding mistakenly deleted domino example * removing eos scripts and checkpoint --------- Co-authored-by: Peter Sharpe <[email protected]>
1 parent b1ef1f2 commit cf342fd

File tree

16 files changed

+6006
-94
lines changed

16 files changed

+6006
-94
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3333
- Enabled TransformerEngine backend in the `transolver` model.
3434
- Added a new example for external_aerodynamics: training `transolver` on
3535
irregular mesh data for DrivaerML surface data.
36+
- Added a new example for external aerodynamics for finetuning pretrained models.
3637

3738
### Changed
3839

README.md

Lines changed: 90 additions & 93 deletions
Large diffs are not rendered by default.

docs/examples_cfd.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,5 @@ Computational Fluid Dynamics (CFD) examples using PhysicsNeMo.
2424
examples/cfd/vortex_shedding_mesh_reduced/README.rst
2525
examples/cfd/darcy_transolver/README.rst
2626
examples/cfd/flow_reconstruction_diffusion/README.rst
27-
examples/cfd/datacenter/README.rst
27+
examples/cfd/datacenter/README.rst
28+
examples/cfd/external_aerodynamics/domino_nim_finetuning/README.rst
Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,325 @@
1+
# DoMINO-Automotive-Aero NIM Fine-tuning
2+
3+
## Overview
4+
5+
This example showcases a **fine-tuning recipe** for the **DoMINO-Automotive-Aero NIM**,
6+
featuring an innovative **predictor-corrector approach** specifically designed for
7+
automotive CFD simulations.
8+
9+
**Accelerated Training**: Dramatically reduce training time by leveraging pre-trained
10+
models instead of starting from scratch
11+
12+
**Smart Transfer Learning**: Efficiently adapt powerful base models to new vehicle
13+
configurations and boundary conditions
14+
15+
**Predictor-Corrector Approach**: An approach that combines the strengths of
16+
pre-trained models with AI model based corrections
17+
18+
The predictor-corrector methodology is described below:
19+
20+
```bash
21+
Y_finetuned = Y_predictor + Y_corrector
22+
```
23+
24+
**The Components:**
25+
26+
- **Y_predictor**: Output from the pre-trained DoMINO-Automotive-Aero NIM (frozen weights)
27+
- **Y_corrector**: A lightweight, trainable network that learns to correct prediction errors
28+
- **Y_finetuned**: The final enhanced prediction combining both components
29+
30+
> **💡 Core Insight**: The predictor leverages extensive pre-training to provide robust
31+
baseline predictions, while the corrector focuses on learning dataset-specific refinements.
32+
This division of labor leads to faster convergence and superior performance compared to
33+
training from scratch.
34+
35+
The finetuning example validated on the OSS DrivAerML dataset with only 16 training
36+
and 8 testing samples. The results presented are preliminary and show encouraging
37+
results. A thourough investigation is underway to provide more concrete datapoints
38+
in terms of accuracy improvement and convergence acceleration.
39+
40+
### Key Features
41+
42+
- **Predictor-Corrector Approach**: Combines pre-trained models with learnable corrections
43+
- **Transfer Learning**: Efficient adaptation to new vehicle configurations and boundary
44+
conditions
45+
- **DrivAerML Integration**: Seamless integration with the DrivAerML dataset
46+
- **Modular Design**: Easy customization of both predictor and corrector models
47+
- **High Performance**: Optimized for multi-GPU training and inference
48+
49+
### Architecture Components
50+
51+
| Component | Description | Training Mode |
52+
|-----------|-------------|---------------|
53+
| **Predictor** | Pre-trained DoMINO-Automotive-Aero NIM | Frozen (Evaluation Only) |
54+
| **Corrector** | Custom DoMINO architecture | Trainable |
55+
| **Combined** | Predictor + Corrector outputs | End-to-End Inference |
56+
57+
## Code Structure
58+
59+
```bash
60+
domino_automotive_aero_nim_finetuning/
61+
├── src/ # Core Implementation
62+
│ ├── conf/ # Configuration Management
63+
│ │ ├── config.yaml # Main training configuration
64+
│ │ └── config_base_pred.yaml # Base prediction settings
65+
│ ├── model_base_predictor.py # DoMINO predictor architecture
66+
│ ├── train.py # Training pipeline
67+
│ ├── test.py # Testing & inference pipeline
68+
│ ├── generate_base_predictions.py # Base model predictions
69+
│ ├── process_data.py # Data preprocessing utilities
70+
│ └── openfoam_datapipe.py # VTK → NPY conversion
71+
├── nim_checkpoint/ # Pre-trained Models
72+
│ └── domino-drivesim-recent.pt # Pretrained model weights
73+
├── download_dataset_huggingface.sh # Automated dataset download
74+
└── README.md # This documentation
75+
```
76+
77+
## Dataset & Model Setup
78+
79+
### DrivAerML Dataset
80+
81+
The **DrivAerML** dataset provides comprehensive automotive CFD simulations with
82+
multiple vehicle configurations.
83+
The dataset maybe found here: [DrivAerML Dataset](https://caemldatasets.org/drivaerml/)
84+
85+
| File Type | Description | Extension | Use Case |
86+
|-----------|-------------|-----------|----------|
87+
| **Geometry** | Vehicle STL meshes | `.stl` | 3D vehicle structure |
88+
| **Volume Fields** | 3D flow field data | `.vtu` | Velocity, pressure, turbulence |
89+
| **Surface Fields** | Vehicle surface data | `.vtp` | Wall pressure, shear stress |
90+
91+
### Dataset Download
92+
93+
```bash
94+
# Download specific runs (e.g., runs 1-32)
95+
./download_dataset_huggingface.sh -d ./drivaer_data -s 1 -e 32
96+
```
97+
98+
### DoMINO-Automotive-Aero NIM Checkpoint
99+
100+
Download the DoMINO-Automotive-Aero NIM checkpoint from NGC and add it to the
101+
checkpoint directory
102+
103+
**Source**: [Domino Checkpoint](https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/models/domino-drivsim)
104+
105+
**Note**: Requires NGC API key for access. See [NGC documentation](https://docs.nvidia.com/ngc/)
106+
for setup.
107+
108+
## Usage Guide
109+
110+
### Complete Fine-tuning Workflow
111+
112+
<!-- markdownlint-disable -->
113+
<div align="center">
114+
```mermaid
115+
graph TD
116+
A[Download Dataset and pre-trained DoMINO NIM] --> B[Generate Base Predictions]
117+
B --> C[Process Data VTP → NPY]
118+
C --> D[Configure Training]
119+
D --> E[Train Corrector Model]
120+
E --> F[Test & Evaluate]
121+
F --> G[Deploy Fine-tuned Model]
122+
```
123+
124+
</div>
125+
126+
### Step-by-Step Instructions
127+
128+
#### **Step 1: Generate Base Predictions**
129+
130+
Generate initial predictions using the pre-trained checkpoint. Modify the eval tab in
131+
`config_base_pred.yaml` to specify the path to the downloaded checkpoint.
132+
133+
```bash
134+
# Run predictor model on dataset
135+
python src/generate_base_predictions.py
136+
137+
# Output: Predictions saved as VTP files with base model outputs
138+
```
139+
140+
#### **Step 2: Data Processing (VTP → NPY)**
141+
142+
Convert VTP prediction files to efficient NPY format for training:
143+
144+
```bash
145+
# Convert and preprocess data
146+
python src/process_data.py
147+
148+
# Output: Training-ready NPY files with predictor outputs + ground truth
149+
```
150+
151+
#### **Step 3: Train Corrector Model**
152+
153+
Train the corrector network to learn prediction refinements:
154+
155+
```bash
156+
# Start training with default configuration
157+
python src/train.py exp_tag=combined
158+
159+
# Custom configuration example
160+
python src/train.py \
161+
exp_tag=1 \
162+
project.name=Dataset_Finetune \
163+
model.volume_points_sample=16384 \
164+
model.surface_points_sample=16384 \
165+
train.epochs=500
166+
```
167+
168+
#### **Step 4: Test Fine-tuned Model**
169+
170+
Evaluate the combined predictor-corrector model:
171+
172+
```bash
173+
# Run inference on test dataset
174+
python src/test.py \
175+
exp_tag=1 \
176+
eval.checkpoint_name=DoMINO.0.500.pt \
177+
eval.save_path=/path/to/results \
178+
eval.test_path=/path/to/test_data
179+
```
180+
181+
Output of the test script are final predictions combining predictor + corrector
182+
written to a VTP/VTU file.
183+
184+
## Benchmarking results on DrivAerML dataset
185+
186+
The finetuning recipe is benchmarked for a subset of the DrivAerML dataset.
187+
The finetuning is carried out on the first 24 samples from this dataset and
188+
compared against training from scratch with the DoMINO model on the same dataset.
189+
The DoMINO-Automotive-Aero NIM is trained on a dataset consisting of RANS
190+
simulations, while this DrivAerML dataset consists of high-fidelity, time-averaged
191+
LES simulations. The goal of this recipe is to demonstrate the finetuning of an
192+
existing model checkpoint to a new design space and physics and compare it against
193+
training from scratch.
194+
195+
Both models are evaluated at 50, 100, 200, 300, and 400 epochs to demonstrate
196+
faster convergence of the finetuned model to an acceptable accuracy as compared
197+
to training from scratch. 18 samples are used for training and 6 for validation.
198+
The results averaged over the validation set are presented in the table below
199+
and demonstrate that finetuning results in faster convergence (in fewer epochs)
200+
of results as compared to training from scratch.
201+
202+
<!-- markdownlint-disable -->
203+
<style type="text/css">
204+
.tg {border-collapse:collapse;border-spacing:0;}
205+
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
206+
overflow:hidden;padding:7px 16px;word-break:normal;}
207+
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
208+
font-weight:normal;overflow:hidden;padding:7px 16px;word-break:normal;}
209+
.tg .tg-baqh{text-align:center;vertical-align:top}
210+
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
211+
</style>
212+
<table class="tg"><thead>
213+
<tr>
214+
<th class="tg-c3ow" rowspan="2">Epochs</th>
215+
<th class="tg-baqh" colspan="4">Baseline Model L2 Error</th>
216+
<th class="tg-baqh" colspan="4">Fine-tuned Model L2 Error</th>
217+
</tr>
218+
<tr>
219+
<th class="tg-baqh">Velocity</th>
220+
<th class="tg-baqh">Vol. Pressure</th>
221+
<th class="tg-baqh">Surf. Pressure</th>
222+
<th class="tg-baqh">Wall Shear</th>
223+
<th class="tg-baqh">Velocity</th>
224+
<th class="tg-baqh">Vol. Pressure</th>
225+
<th class="tg-baqh">Surf. Pressure</th>
226+
<th class="tg-baqh">Wall Shear</th>
227+
</tr></thead>
228+
<tbody>
229+
<tr>
230+
<td class="tg-baqh">50</td>
231+
<td class="tg-baqh">0.521</td>
232+
<td class="tg-baqh">0.557</td>
233+
<td class="tg-baqh">0.546</td>
234+
<td class="tg-baqh">0.683</td>
235+
<td class="tg-baqh">0.342</td>
236+
<td class="tg-baqh">0.316</td>
237+
<td class="tg-baqh">0.374</td>
238+
<td class="tg-baqh">0.563</td>
239+
</tr>
240+
<tr>
241+
<td class="tg-baqh">100</td>
242+
<td class="tg-baqh">0.444</td>
243+
<td class="tg-baqh">0.474</td>
244+
<td class="tg-baqh">0.436</td>
245+
<td class="tg-baqh">0.613</td>
246+
<td class="tg-baqh">0.332</td>
247+
<td class="tg-baqh">0.307</td>
248+
<td class="tg-baqh">0.333</td>
249+
<td class="tg-baqh">0.473</td>
250+
</tr>
251+
<tr>
252+
<td class="tg-baqh">200</td>
253+
<td class="tg-baqh">0.405</td>
254+
<td class="tg-baqh">0.388</td>
255+
<td class="tg-baqh">0.386</td>
256+
<td class="tg-baqh">0.571</td>
257+
<td class="tg-baqh">0.313</td>
258+
<td class="tg-baqh">0.303</td>
259+
<td class="tg-baqh">0.312</td>
260+
<td class="tg-baqh">0.416</td>
261+
</tr>
262+
<tr>
263+
<td class="tg-baqh">300</td>
264+
<td class="tg-baqh">0.390</td>
265+
<td class="tg-baqh">0.365</td>
266+
<td class="tg-baqh">0.369</td>
267+
<td class="tg-baqh">0.563</td>
268+
<td class="tg-baqh">0.310</td>
269+
<td class="tg-baqh">0.301</td>
270+
<td class="tg-baqh">0.308</td>
271+
<td class="tg-baqh">0.406</td>
272+
</tr>
273+
<tr>
274+
<td class="tg-baqh">400</td>
275+
<td class="tg-baqh">0.380</td>
276+
<td class="tg-baqh">0.362</td>
277+
<td class="tg-baqh">0.365</td>
278+
<td class="tg-baqh">0.552</td>
279+
<td class="tg-baqh">0.309</td>
280+
<td class="tg-baqh">0.300</td>
281+
<td class="tg-baqh">0.307</td>
282+
<td class="tg-baqh">0.403</td>
283+
</tr>
284+
</tbody></table>
285+
286+
It must be noted that the training and validation accuracy for training from
287+
scratch can be improved as more samples are added and the same is the case
288+
with finetuning. The goal of this analysis is to demonstrate the benefits of
289+
finetuning from a pretrained model checkpoint as compared to training from
290+
scratch. A more comprehensive analysis correlating the training from scratch
291+
and finetuning accuracy with the dataset size will be carried out in future.
292+
293+
## Customization & Extensions
294+
295+
### Custom Model Architectures
296+
297+
The recipe is designed for easy customization:
298+
299+
| Component | File | Customization Level |
300+
|-----------|------|-------------------|
301+
| **Predictor** | `model_base_predictor.py` | **Pretrained Custom
302+
Model (or DoMINO NIM)** |
303+
| **Corrector** | Built-in DoMINO | **Fully Customizable Models** |
304+
| **Training** | `train.py` | **Configuration-driven** |
305+
| **Testing** | `test.py` | **Workflow Adaptable** |
306+
307+
### Integration Guidelines
308+
309+
The predictor-corrector approach is model-agnostic.
310+
311+
**To use custom architectures:**
312+
313+
1. **Custom Predictor**: Replace `model_base_predictor.py` with your pretrained model
314+
2. **Custom Corrector**: Modify the corrector architecture in training configuration
315+
3. **Maintain Interface**: Ensure input/output compatibility between components
316+
4. **Update Testing**: Adapt `test.py` for new model combinations
317+
318+
---
319+
320+
## Additional Resources
321+
322+
### Quick Links
323+
324+
- [DoMINO-Automotive-Aero NIM Docs](https://docs.nvidia.com/nim/physicsnemo/domino-automotive-aero/latest/overview.html)
325+
- [DrivAerML Dataset](https://caemldatasets.org/drivaerml/)

0 commit comments

Comments
 (0)