8
8
Sofiia Chorna `@sofiia-chorna <https://github.com/sofiia-chorna>`_
9
9
10
10
This example demonstrates fine-tuning the PET-MAD model with `metatrain
11
- <https://github.com/metatensor/metatrain>`_ on a small dataset of ethanol structures.
11
+ <https://github.com/metatensor/metatrain>`_ and two-stage training strategy from
12
+ scratch.
12
13
13
14
We cover two examples: 1) simple fine-tuning the pretrained PET-MAD model to adapt it to
14
15
specialized task by retraining it on a domain-specific dataset, and 2) two-stage
15
16
training strategy, first on non-conservative forces for efficiency, followed by
16
17
fine-tuning on conservative forces to ensure physical consistency.
17
18
18
- PET-MAD is a universal machine-learning forcefield trained on `the MAD dataset
19
- <https://arxiv.org/abs/2506.19674>`_ that aims to incorporate a very high degree of
20
- structural diversity. It uses `the Point-Edge Transformer (PET)
19
+ `PET-MAD <https://arxiv.org/abs/2503.14118>`_ is a universal machine-learning forcefield
20
+ trained on `the MAD dataset <https://arxiv.org/abs/2506.19674>`_ that aims to
21
+ incorporate a very high degree of structural diversity. The model itself is `the
22
+ Point-Edge Transformer (PET)
21
23
<https://proceedings.neurips.cc/paper_files/paper/2023/file/fb4a7e3522363907b26a86cc5be627ac-Paper-Conference.pdf>`_,
22
24
an unconstrained architecture that achieves symmetry compliance through data
23
25
augmentation during training.
72
74
#
73
75
# DFT-calculated energies often contain systematic shifts due to the choice of
74
76
# functional, basis set, or pseudopotentials. If left uncorrected, such shifts can
75
- # mislead the fine-tuning process. We apply a linear correction based on atomic
76
- # compositions to align our fine-tuning dataset with PET-MAD energy reference. First, we
77
- # define a helper function to load reference energies from PET-MAD.
77
+ # mislead the fine-tuning process.
78
+ #
79
+ # On this example we use the sampled subset of ethanol structures from `rMD17 dataset
80
+ # <https://doi.org/10.48550/arXiv.2007.09593>`_ with PBE/def2-SVP level of theory which
81
+ # is different from the MAD where PBEsol is used. We apply a linear correction based on
82
+ # atomic compositions to align our fine-tuning dataset with PET-MAD energy reference.
83
+ # First, we define a helper function to load reference energies from PET-MAD.
78
84
79
85
80
86
def load_reference_energies (checkpoint_path ):
@@ -95,8 +101,8 @@ def load_reference_energies(checkpoint_path):
95
101
96
102
# %%
97
103
#
98
- # The dataset is composed of 100 structures of ethanol. We fit a linear model based on
99
- # atomic compositions to apply energy correction.
104
+ # For demonstration, the dataset is composed only of 100 structures of ethanol. We fit a
105
+ # linear model based on atomic compositions to apply energy correction.
100
106
101
107
dataset = ase .io .read ("data/ethanol.xyz" , index = ":" , format = "extxyz" )
102
108
@@ -237,18 +243,15 @@ def display_loss(csv_file):
237
243
238
244
# %%
239
245
#
240
- # The result of running the fine-tuning for 1000 epoches in visualised with
241
- # ``chemiscope`` below:
246
+ # The result of running the fine-tuning for 1000 epoches for 1000 structures in
247
+ # visualised with ``chemiscope`` below:
242
248
import chemiscope # noqa: E402
243
249
244
250
245
251
chemiscope .show_input ("full_finetune_example.chemiscope.json" )
246
252
247
253
# %%
248
254
#
249
- # The fine-tuning learning curves for 1000 epoches displayed on the figure below. For
250
- # comparison, we also train a model from scratch on the same dataset:
251
-
252
255
# The figure below shows the training and validation losses over 1000 epochs of
253
256
# fine-tuning. For comparison, we also train the model from scratch on the same dataset.
254
257
# Expectedly, the pretrained model achieves better accuracy.
@@ -261,11 +264,14 @@ def display_loss(csv_file):
261
264
# Two stage training strategy
262
265
# ---------------------------
263
266
#
264
- # This approach accelerates training by first using non-conservative forces, which
265
- # avoids costly backpropagation, then fine-tuning on conservative forces to ensure
266
- # physical consistency. Non-conservative forces can lead to pathological behavior (see
267
- # `preprint <https://arxiv.org/abs/2412.11569>`_), but this strategy mitigates such
268
- # issues while maintaining efficiency.
267
+ # As discussed in <this paper https://arxiv.org/abs/2412.11569>`_, while conservative
268
+ # MLIPs are generally better suited for physically accurate simulations, hybrid models
269
+ # that support direct non-conservative force predictions can accelerate both training
270
+ # and inference. We demonstrate this practical compromise through a two-stage approach:
271
+ # first train a model to predict non-conservative forces directly (which avoids the cost
272
+ # of backpropagation) and then fine-tuning its energy head to produce conservative
273
+ # forces. Although non-conservative forces can lead to unphysical behavior, this
274
+ # two-step strategy balances efficiency and physical reliability.
269
275
#
270
276
# Non-conservative force training
271
277
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -357,7 +363,8 @@ def display_loss(csv_file):
357
363
# %%
358
364
#
359
365
# After fine-tuning for 50 epochs, the updated parity plots show improved force
360
- # predictions (left) with conservative forces.
366
+ # predictions (left) with conservative forces. The grayscale points in the background
367
+ # correspond to the predicted forces from the previous step.
361
368
#
362
369
# .. image:: c_ft_res.png
363
370
# :align: center
0 commit comments