Skip to content

Commit d4017e7

Browse files
committed
new post on diffuse data refinement approach
1 parent 6005e35 commit d4017e7

2 files changed

Lines changed: 94 additions & 21 deletions

File tree

_posts/2025-07-29-davinci.md~

Lines changed: 0 additions & 21 deletions
This file was deleted.

_posts/2025-09-24-jax_refine.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
2+
## Optimizing Molecular Dynamics Weights with Machine Learning Tools
3+
4+
**By James Holton, with contributions from Karson Chrispens, Steve, and Marcus Collins**
5+
6+
In our latest round of diffuse scattering experiments, we ran into an intriguing optimization problem that feels a lot like training a neural network.
7+
8+
### The Scientific Setup
9+
10+
For each 3D pixel in reciprocal space (indexed by **h**), we have:
11+
12+
* **Observed data**, ( y(h) ), from experiment
13+
* **Predicted data**, ( x(h) ), computed from molecular dynamics (MD) trajectories
14+
15+
We evaluate agreement using the **Pearson correlation coefficient**:
16+
17+
[
18+
CC = \frac{\langle xy \rangle - \langle x \rangle \langle y \rangle}{\sqrt{\langle x^2 \rangle - \langle x \rangle^2}\sqrt{\langle y^2 \rangle - \langle y \rangle^2}}
19+
]
20+
21+
Each prediction ( x(h) ) is derived from **structure factors** ( F(h, t) ) across time points in the MD simulation:
22+
23+
[
24+
x(h) = \langle F(h)^2 \rangle_t - \langle F(h) \rangle_t^2
25+
]
26+
27+
The goal is to assign **weights** ( w(t) ) to each time point to maximize ( CC ):
28+
29+
[
30+
x'(h) = \sum_t w_t F(h,t)^2 - \left( \sum_t w_t F(h,t) \right)^2
31+
]
32+
33+
If we can find optimal weights, we can identify which regions of the trajectory best match experimental reality — potentially distinguishing “good” frames from those that detract from agreement.
34+
35+
### Community Brainstorming
36+
37+
**Steve** suggested asking whether CC is the right target — perhaps a likelihood might better capture the physics.
38+
39+
**Karson Chrispens** proposed leveraging machine learning frameworks like **JAX** or **PyTorch** to treat the weights as trainable parameters. By backpropagating through the Pearson correlation, an optimizer like Adam could efficiently learn the optimal weights.
40+
41+
**James Holton** suspected this approach could outperform traditional non-linear least-squares optimization and shared example MTZ datasets for testing.
42+
43+
**Steve** also mentioned using a **genetic algorithm** if the weights were binary (0 or 1), though acknowledged the continuous formulation might not have a unique minimum.
44+
45+
### Prototyping the Optimizer
46+
47+
Karson quickly implemented a JAX-based prototype using **reciprocalspaceship** for MTZ I/O and **optax** for optimization.
48+
The loss function was simply **–CC**, and weights were constrained to (0, 1) via a sigmoid.
49+
50+
When tested on toy datasets and real MTZ files, the optimizer:
51+
52+
* Successfully recovered **50:50** weights for mixtures of two “ground-truth” structures.
53+
* Produced sensible intermediate values when one or both inputs were “wrong.”
54+
* Converged robustly from different initializations.
55+
56+
Example output for a ground-truth mixture:
57+
58+
```
59+
Final weights: [0.46, 0.54]
60+
Final CC: 1.0000
61+
```
62+
63+
And for mismatched data:
64+
65+
```
66+
Final weights: [0.76, 0.24]
67+
Final CC: 0.78
68+
```
69+
70+
### Discussion
71+
72+
**Marcus Collins** noted that this approach resembles computing **Boltzmann-like factors** for each configuration and suggested PyTorch could be an equally good (and more common) platform. He also cautioned that Pearson CC may not be the optimal objective function.
73+
74+
Karson confirmed that JAX runs efficiently on GPUs and planned to scale the approach to larger datasets by stacking multiple MTZ files.
75+
76+
### Where This Might Go Next
77+
78+
This prototype demonstrates that **gradient-based optimization** can efficiently identify the contribution of different MD frames to observed diffuse scattering patterns. Future directions include:
79+
80+
* Expanding to full MD trajectories with thousands of frames.
81+
* Experimenting with alternate objectives (e.g., likelihood, cross-entropy).
82+
* Incorporating **crystal symmetry** and **resolution weighting**.
83+
* Exploring physical interpretations of the resulting weights.
84+
85+
### Code and Data
86+
87+
Karson’s implementation, `pearson_target.py`, is available [here](https://github.com/k-chrispens/simulation_timeseries_optim), and the test MTZ data can be downloaded from
88+
[here](http://bl831.als.lbl.gov/~jamesh/pickup/diffUSE_CC_opt_test.tgz)
89+
90+
---
91+
92+
**TL;DR:**
93+
By treating MD frame weights as trainable parameters in a differentiable Pearson correlation objective, we can use ML optimizers like Adam to rapidly identify which parts of a trajectory best explain experimental diffuse scattering — turning a brute-force search into a smooth, data-driven optimization problem.
94+

0 commit comments

Comments
 (0)