|
| 1 | +# DoMINO-Automotive-Aero NIM Fine-tuning |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This example showcases a **fine-tuning recipe** for the **DoMINO-Automotive-Aero NIM**, |
| 6 | +featuring an innovative **predictor-corrector approach** specifically designed for |
| 7 | +automotive CFD simulations. |
| 8 | + |
| 9 | +**Accelerated Training**: Dramatically reduce training time by leveraging pre-trained |
| 10 | +models instead of starting from scratch |
| 11 | + |
| 12 | +**Smart Transfer Learning**: Efficiently adapt powerful base models to new vehicle |
| 13 | +configurations and boundary conditions |
| 14 | + |
| 15 | +**Predictor-Corrector Approach**: An approach that combines the strengths of |
| 16 | +pre-trained models with AI model based corrections |
| 17 | + |
| 18 | +The predictor-corrector methodology is described below: |
| 19 | + |
| 20 | +```bash |
| 21 | +Y_finetuned = Y_predictor + Y_corrector |
| 22 | +``` |
| 23 | + |
| 24 | +**The Components:** |
| 25 | + |
| 26 | +- **Y_predictor**: Output from the pre-trained DoMINO-Automotive-Aero NIM (frozen weights) |
| 27 | +- **Y_corrector**: A lightweight, trainable network that learns to correct prediction errors |
| 28 | +- **Y_finetuned**: The final enhanced prediction combining both components |
| 29 | + |
| 30 | +> **💡 Core Insight**: The predictor leverages extensive pre-training to provide robust |
| 31 | +baseline predictions, while the corrector focuses on learning dataset-specific refinements. |
| 32 | +This division of labor leads to faster convergence and superior performance compared to |
| 33 | +training from scratch. |
| 34 | + |
| 35 | +The finetuning example validated on the OSS DrivAerML dataset with only 16 training |
| 36 | +and 8 testing samples. The results presented are preliminary and show encouraging |
| 37 | +results. A thourough investigation is underway to provide more concrete datapoints |
| 38 | +in terms of accuracy improvement and convergence acceleration. |
| 39 | + |
| 40 | +### Key Features |
| 41 | + |
| 42 | +- **Predictor-Corrector Approach**: Combines pre-trained models with learnable corrections |
| 43 | +- **Transfer Learning**: Efficient adaptation to new vehicle configurations and boundary |
| 44 | +conditions |
| 45 | +- **DrivAerML Integration**: Seamless integration with the DrivAerML dataset |
| 46 | +- **Modular Design**: Easy customization of both predictor and corrector models |
| 47 | +- **High Performance**: Optimized for multi-GPU training and inference |
| 48 | + |
| 49 | +### Architecture Components |
| 50 | + |
| 51 | +| Component | Description | Training Mode | |
| 52 | +|-----------|-------------|---------------| |
| 53 | +| **Predictor** | Pre-trained DoMINO-Automotive-Aero NIM | Frozen (Evaluation Only) | |
| 54 | +| **Corrector** | Custom DoMINO architecture | Trainable | |
| 55 | +| **Combined** | Predictor + Corrector outputs | End-to-End Inference | |
| 56 | + |
| 57 | +## Code Structure |
| 58 | + |
| 59 | +```bash |
| 60 | +domino_automotive_aero_nim_finetuning/ |
| 61 | +├── src/ # Core Implementation |
| 62 | +│ ├── conf/ # Configuration Management |
| 63 | +│ │ ├── config.yaml # Main training configuration |
| 64 | +│ │ └── config_base_pred.yaml # Base prediction settings |
| 65 | +│ ├── model_base_predictor.py # DoMINO predictor architecture |
| 66 | +│ ├── train.py # Training pipeline |
| 67 | +│ ├── test.py # Testing & inference pipeline |
| 68 | +│ ├── generate_base_predictions.py # Base model predictions |
| 69 | +│ ├── process_data.py # Data preprocessing utilities |
| 70 | +│ └── openfoam_datapipe.py # VTK → NPY conversion |
| 71 | +├── nim_checkpoint/ # Pre-trained Models |
| 72 | +│ └── domino-drivesim-recent.pt # Pretrained model weights |
| 73 | +├── download_dataset_huggingface.sh # Automated dataset download |
| 74 | +└── README.md # This documentation |
| 75 | +``` |
| 76 | + |
| 77 | +## Dataset & Model Setup |
| 78 | + |
| 79 | +### DrivAerML Dataset |
| 80 | + |
| 81 | +The **DrivAerML** dataset provides comprehensive automotive CFD simulations with |
| 82 | +multiple vehicle configurations. |
| 83 | +The dataset maybe found here: [DrivAerML Dataset](https://caemldatasets.org/drivaerml/) |
| 84 | + |
| 85 | +| File Type | Description | Extension | Use Case | |
| 86 | +|-----------|-------------|-----------|----------| |
| 87 | +| **Geometry** | Vehicle STL meshes | `.stl` | 3D vehicle structure | |
| 88 | +| **Volume Fields** | 3D flow field data | `.vtu` | Velocity, pressure, turbulence | |
| 89 | +| **Surface Fields** | Vehicle surface data | `.vtp` | Wall pressure, shear stress | |
| 90 | + |
| 91 | +### Dataset Download |
| 92 | + |
| 93 | +```bash |
| 94 | +# Download specific runs (e.g., runs 1-32) |
| 95 | +./download_dataset_huggingface.sh -d ./drivaer_data -s 1 -e 32 |
| 96 | +``` |
| 97 | + |
| 98 | +### DoMINO-Automotive-Aero NIM Checkpoint |
| 99 | + |
| 100 | +Download the DoMINO-Automotive-Aero NIM checkpoint from NGC and add it to the |
| 101 | +checkpoint directory |
| 102 | + |
| 103 | +**Source**: [Domino Checkpoint](https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/models/domino-drivsim) |
| 104 | + |
| 105 | +**Note**: Requires NGC API key for access. See [NGC documentation](https://docs.nvidia.com/ngc/) |
| 106 | +for setup. |
| 107 | + |
| 108 | +## Usage Guide |
| 109 | + |
| 110 | +### Complete Fine-tuning Workflow |
| 111 | + |
| 112 | +<!-- markdownlint-disable --> |
| 113 | +<div align="center"> |
| 114 | +```mermaid |
| 115 | +graph TD |
| 116 | + A[Download Dataset and pre-trained DoMINO NIM] --> B[Generate Base Predictions] |
| 117 | + B --> C[Process Data VTP → NPY] |
| 118 | + C --> D[Configure Training] |
| 119 | + D --> E[Train Corrector Model] |
| 120 | + E --> F[Test & Evaluate] |
| 121 | + F --> G[Deploy Fine-tuned Model] |
| 122 | +``` |
| 123 | + |
| 124 | +</div> |
| 125 | + |
| 126 | +### Step-by-Step Instructions |
| 127 | + |
| 128 | +#### **Step 1: Generate Base Predictions** |
| 129 | + |
| 130 | +Generate initial predictions using the pre-trained checkpoint. Modify the eval tab in |
| 131 | +`config_base_pred.yaml` to specify the path to the downloaded checkpoint. |
| 132 | + |
| 133 | +```bash |
| 134 | +# Run predictor model on dataset |
| 135 | +python src/generate_base_predictions.py |
| 136 | + |
| 137 | +# Output: Predictions saved as VTP files with base model outputs |
| 138 | +``` |
| 139 | + |
| 140 | +#### **Step 2: Data Processing (VTP → NPY)** |
| 141 | + |
| 142 | +Convert VTP prediction files to efficient NPY format for training: |
| 143 | + |
| 144 | +```bash |
| 145 | +# Convert and preprocess data |
| 146 | +python src/process_data.py |
| 147 | + |
| 148 | +# Output: Training-ready NPY files with predictor outputs + ground truth |
| 149 | +``` |
| 150 | + |
| 151 | +#### **Step 3: Train Corrector Model** |
| 152 | + |
| 153 | +Train the corrector network to learn prediction refinements: |
| 154 | + |
| 155 | +```bash |
| 156 | +# Start training with default configuration |
| 157 | +python src/train.py exp_tag=combined |
| 158 | + |
| 159 | +# Custom configuration example |
| 160 | +python src/train.py \ |
| 161 | + exp_tag=1 \ |
| 162 | + project.name=Dataset_Finetune \ |
| 163 | + model.volume_points_sample=16384 \ |
| 164 | + model.surface_points_sample=16384 \ |
| 165 | + train.epochs=500 |
| 166 | +``` |
| 167 | + |
| 168 | +#### **Step 4: Test Fine-tuned Model** |
| 169 | + |
| 170 | +Evaluate the combined predictor-corrector model: |
| 171 | + |
| 172 | +```bash |
| 173 | +# Run inference on test dataset |
| 174 | +python src/test.py \ |
| 175 | + exp_tag=1 \ |
| 176 | + eval.checkpoint_name=DoMINO.0.500.pt \ |
| 177 | + eval.save_path=/path/to/results \ |
| 178 | + eval.test_path=/path/to/test_data |
| 179 | +``` |
| 180 | + |
| 181 | +Output of the test script are final predictions combining predictor + corrector |
| 182 | +written to a VTP/VTU file. |
| 183 | + |
| 184 | +## Benchmarking results on DrivAerML dataset |
| 185 | + |
| 186 | +The finetuning recipe is benchmarked for a subset of the DrivAerML dataset. |
| 187 | +The finetuning is carried out on the first 24 samples from this dataset and |
| 188 | +compared against training from scratch with the DoMINO model on the same dataset. |
| 189 | +The DoMINO-Automotive-Aero NIM is trained on a dataset consisting of RANS |
| 190 | +simulations, while this DrivAerML dataset consists of high-fidelity, time-averaged |
| 191 | +LES simulations. The goal of this recipe is to demonstrate the finetuning of an |
| 192 | +existing model checkpoint to a new design space and physics and compare it against |
| 193 | +training from scratch. |
| 194 | + |
| 195 | +Both models are evaluated at 50, 100, 200, 300, and 400 epochs to demonstrate |
| 196 | +faster convergence of the finetuned model to an acceptable accuracy as compared |
| 197 | +to training from scratch. 18 samples are used for training and 6 for validation. |
| 198 | +The results averaged over the validation set are presented in the table below |
| 199 | +and demonstrate that finetuning results in faster convergence (in fewer epochs) |
| 200 | +of results as compared to training from scratch. |
| 201 | + |
| 202 | +<!-- markdownlint-disable --> |
| 203 | +<style type="text/css"> |
| 204 | +.tg {border-collapse:collapse;border-spacing:0;} |
| 205 | +.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; |
| 206 | + overflow:hidden;padding:7px 16px;word-break:normal;} |
| 207 | +.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; |
| 208 | + font-weight:normal;overflow:hidden;padding:7px 16px;word-break:normal;} |
| 209 | +.tg .tg-baqh{text-align:center;vertical-align:top} |
| 210 | +.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} |
| 211 | +</style> |
| 212 | +<table class="tg"><thead> |
| 213 | + <tr> |
| 214 | + <th class="tg-c3ow" rowspan="2">Epochs</th> |
| 215 | + <th class="tg-baqh" colspan="4">Baseline Model L2 Error</th> |
| 216 | + <th class="tg-baqh" colspan="4">Fine-tuned Model L2 Error</th> |
| 217 | + </tr> |
| 218 | + <tr> |
| 219 | + <th class="tg-baqh">Velocity</th> |
| 220 | + <th class="tg-baqh">Vol. Pressure</th> |
| 221 | + <th class="tg-baqh">Surf. Pressure</th> |
| 222 | + <th class="tg-baqh">Wall Shear</th> |
| 223 | + <th class="tg-baqh">Velocity</th> |
| 224 | + <th class="tg-baqh">Vol. Pressure</th> |
| 225 | + <th class="tg-baqh">Surf. Pressure</th> |
| 226 | + <th class="tg-baqh">Wall Shear</th> |
| 227 | + </tr></thead> |
| 228 | +<tbody> |
| 229 | + <tr> |
| 230 | + <td class="tg-baqh">50</td> |
| 231 | + <td class="tg-baqh">0.521</td> |
| 232 | + <td class="tg-baqh">0.557</td> |
| 233 | + <td class="tg-baqh">0.546</td> |
| 234 | + <td class="tg-baqh">0.683</td> |
| 235 | + <td class="tg-baqh">0.342</td> |
| 236 | + <td class="tg-baqh">0.316</td> |
| 237 | + <td class="tg-baqh">0.374</td> |
| 238 | + <td class="tg-baqh">0.563</td> |
| 239 | + </tr> |
| 240 | + <tr> |
| 241 | + <td class="tg-baqh">100</td> |
| 242 | + <td class="tg-baqh">0.444</td> |
| 243 | + <td class="tg-baqh">0.474</td> |
| 244 | + <td class="tg-baqh">0.436</td> |
| 245 | + <td class="tg-baqh">0.613</td> |
| 246 | + <td class="tg-baqh">0.332</td> |
| 247 | + <td class="tg-baqh">0.307</td> |
| 248 | + <td class="tg-baqh">0.333</td> |
| 249 | + <td class="tg-baqh">0.473</td> |
| 250 | + </tr> |
| 251 | + <tr> |
| 252 | + <td class="tg-baqh">200</td> |
| 253 | + <td class="tg-baqh">0.405</td> |
| 254 | + <td class="tg-baqh">0.388</td> |
| 255 | + <td class="tg-baqh">0.386</td> |
| 256 | + <td class="tg-baqh">0.571</td> |
| 257 | + <td class="tg-baqh">0.313</td> |
| 258 | + <td class="tg-baqh">0.303</td> |
| 259 | + <td class="tg-baqh">0.312</td> |
| 260 | + <td class="tg-baqh">0.416</td> |
| 261 | + </tr> |
| 262 | + <tr> |
| 263 | + <td class="tg-baqh">300</td> |
| 264 | + <td class="tg-baqh">0.390</td> |
| 265 | + <td class="tg-baqh">0.365</td> |
| 266 | + <td class="tg-baqh">0.369</td> |
| 267 | + <td class="tg-baqh">0.563</td> |
| 268 | + <td class="tg-baqh">0.310</td> |
| 269 | + <td class="tg-baqh">0.301</td> |
| 270 | + <td class="tg-baqh">0.308</td> |
| 271 | + <td class="tg-baqh">0.406</td> |
| 272 | + </tr> |
| 273 | + <tr> |
| 274 | + <td class="tg-baqh">400</td> |
| 275 | + <td class="tg-baqh">0.380</td> |
| 276 | + <td class="tg-baqh">0.362</td> |
| 277 | + <td class="tg-baqh">0.365</td> |
| 278 | + <td class="tg-baqh">0.552</td> |
| 279 | + <td class="tg-baqh">0.309</td> |
| 280 | + <td class="tg-baqh">0.300</td> |
| 281 | + <td class="tg-baqh">0.307</td> |
| 282 | + <td class="tg-baqh">0.403</td> |
| 283 | + </tr> |
| 284 | +</tbody></table> |
| 285 | + |
| 286 | +It must be noted that the training and validation accuracy for training from |
| 287 | +scratch can be improved as more samples are added and the same is the case |
| 288 | +with finetuning. The goal of this analysis is to demonstrate the benefits of |
| 289 | +finetuning from a pretrained model checkpoint as compared to training from |
| 290 | +scratch. A more comprehensive analysis correlating the training from scratch |
| 291 | +and finetuning accuracy with the dataset size will be carried out in future. |
| 292 | + |
| 293 | +## Customization & Extensions |
| 294 | + |
| 295 | +### Custom Model Architectures |
| 296 | + |
| 297 | +The recipe is designed for easy customization: |
| 298 | + |
| 299 | +| Component | File | Customization Level | |
| 300 | +|-----------|------|-------------------| |
| 301 | +| **Predictor** | `model_base_predictor.py` | **Pretrained Custom |
| 302 | +Model (or DoMINO NIM)** | |
| 303 | +| **Corrector** | Built-in DoMINO | **Fully Customizable Models** | |
| 304 | +| **Training** | `train.py` | **Configuration-driven** | |
| 305 | +| **Testing** | `test.py` | **Workflow Adaptable** | |
| 306 | + |
| 307 | +### Integration Guidelines |
| 308 | + |
| 309 | +The predictor-corrector approach is model-agnostic. |
| 310 | + |
| 311 | +**To use custom architectures:** |
| 312 | + |
| 313 | +1. **Custom Predictor**: Replace `model_base_predictor.py` with your pretrained model |
| 314 | +2. **Custom Corrector**: Modify the corrector architecture in training configuration |
| 315 | +3. **Maintain Interface**: Ensure input/output compatibility between components |
| 316 | +4. **Update Testing**: Adapt `test.py` for new model combinations |
| 317 | + |
| 318 | +--- |
| 319 | + |
| 320 | +## Additional Resources |
| 321 | + |
| 322 | +### Quick Links |
| 323 | + |
| 324 | +- [DoMINO-Automotive-Aero NIM Docs](https://docs.nvidia.com/nim/physicsnemo/domino-automotive-aero/latest/overview.html) |
| 325 | +- [DrivAerML Dataset](https://caemldatasets.org/drivaerml/) |
0 commit comments