Animation showing the two modes of our model: independent sampling by diffusion denoising (left) and molecular dynamics simulation (right).
This repository contains the complete codebase for training and evaluating energy-based diffusion models for molecular dynamics simulations. Our approach enables a single model to perform both independent sampling via diffusion denoising and continuous molecular dynamics simulations through a Fokker-Planck-based regularization scheme.
We introduce a Fokker-Planck-based regularization to train an energy-based diffusion model with stable, self-consistent scores near the data distribution. This regularization ensures that the learned score function corresponds to a consistent energy function, enabling the model to perform both generative sampling and accurate energy-estimation.
We provide minimal working implementations in Jupyter notebooks:
- JAX version (recommended) - Faster and closer to this repository's implementation
- PyTorch version - Alternative implementation
Both notebooks demonstrate how to reproduce similar figures to the one shown above and perform molecular dynamics simulations with a diffusion model on the Müller-Brown potential.
All dependencies are managed with pixi, which ensures fully reproducible environments across different systems.
To set up the environment, run:
pixi install --frozenTo activate the environment, run:
pixi shellIf you are on an amd64 system (e.g. a Linux machine), you can use the docker image to run the code. To build the docker image, run:
docker build -t scoremd .To run the docker container, run:
docker run -it --rm -v $(pwd)/outputs:/workspace/outputs -v $(pwd)/storage:/workspace/storage -v $(pwd)/multirun:/workspace/multirun scoremd python train.py ...If you prefer using your own dependency manager (e.g., conda, pip), you can install the dependencies listed in pyproject.toml with your preferred tool.
We use Hydra for configuration management. You can override any configuration via command-line arguments or configuration files.
Train on example toy systems using the provided configurations:
python train.py dataset=double_well +architecture=mlp/small_potential
python train.py dataset=double_well_2d +architecture=mlp/small_potentialOutputs will be saved to the outputs/ directory.
Important
This repository does not contain all datasets directly.
Training data for the toy systems and alanine dipeptide will be downloaded automatically.
For the dipeptides, you can download the dataset from this release and place it into the ./storage directory (one subfolder for each dataset, e.g. ./storage/minipeptides/, ./storage/deshaw/).
Data for the fast-folder systems can be requested from D. E. Shaw Research, as described in the original paper.
If you do not have access to the fast-folder data, this release also provides dummy data generated by our models, which is sufficient for inference.
We provide pre-trained model weights for all models presented in the paper. For detailed instructions on downloading and using these models, please refer to INFERENCE.md.
To reproduce the results from our paper, see TRAIN.md for the exact training commands used for each model and dataset.
For implementation details and benchmarking against your own methods, we provide evaluation scripts in the evaluation directory.
Feel free to open an issue if you encounter any problems or have questions.
If you find our work useful, please cite:
@article{plainer2025consistent,
author = {Plainer, Michael and Wu, Hao and Klein, Leon and G{\"u}nnemann, Stephan and No{\'e}, Frank},
title = {Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models},
eprint = {arXiv:2506.17139},
year = {2025},
}

