-
Notifications
You must be signed in to change notification settings - Fork 716
Feature non linear time averaged msd #5066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,25 @@ | ||
######## Documentation | ||
# | ||
# Issue: The default '_conclude_simple()' function previously calculated time-averaged mean squared displacement considering linear timespacing between the frames even when a dump file with non-linear timestep gap is provided | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be better if these details were in the PR info instead - this is where a reviewer usually looks for them (and ultimately we will likely merge without this). |
||
# | ||
# Modified by Sirsha Ganguly | ||
# | ||
# Modification: | ||
# '_conclude_modified()' is the new function added to calculate time-averaged mean squared displacement of non-linear dumps | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggest something more descriptive here - _conclude_nonlinear() instead of _conclude_modified() |
||
# Note: This new function is generalized for non-linear dumps. | ||
# Moreover, if you use this function to calculate for linear timedump this gives same result as '_conclude_simple()' which is implemented in the default version. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for checking this, it's really what we need to discuss the most here. Ultimately I think the end target is this PR will be to end up with just one method. Having alternate code paths adds to user difficulty and maintenance costs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just taking things as they are now (before we get into the weeds).Could you comment on performance? Do the two methods run at similar speeds with the same inputs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if someone uses the fft based method instead with non linear times? Does that fail? |
||
# So, I believe the algorithm in '_conclude_modified()' can potentially be used to calculate time-averaged msd in future. | ||
# For now it only executes when the dump is non-linear. [ This is determined by the added function is_linear() ] | ||
# It is tested with a .dump file of a Coarse Grained system of single type | ||
# | ||
# General Algorithm of '_conclude_modified()' to calculate time-averaged msd: | ||
# It creates a dictionary/hash-map with the key being the time difference between the frames (Delta t) and value being a list containing the frame to frame MSD values [msd1, msd2, .....] for that particular Delta t | ||
# Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]} | ||
# The reference frame is changed with each iteration and each time a new Key (i.e, Delta t) is found it is appended in the dictionary/hash-map | ||
# If a duplicate Delta_t is found it is added to the value (msd list) of that specefic Key in the dictionary/hash-map | ||
# Lastly in the dictionary/hash-map for each Delta_t the msd values inside the list are averaged | ||
# | ||
|
||
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*- | ||
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 | ||
# | ||
|
@@ -263,7 +285,17 @@ | |
del Doi | ||
|
||
|
||
|
||
class EinsteinMSD(AnalysisBase): | ||
|
||
@staticmethod | ||
def is_linear(sequence): # This function is to check if the timesteps in the array is linear or not! If it's linear default MDA code will carry-out other wise the modified code carries! | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of comments, write doc strings. |
||
if len(sequence) < 2: | ||
return True | ||
step = sequence[1] - sequence[0] | ||
return all(sequence[i+1] - sequence[i] == step for i in range(len(sequence) - 1)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So right now, it's failing the test. My guess is that it's going down the wrong test path, because it's interpreting whatever data is being passed in by the test as not actually being linear (maybe something like 1.0000 gets interpreted at 0.999999). A couple of options here:
|
||
|
||
|
||
r"""Class to calculate Mean Squared Displacement by the Einstein relation. | ||
|
||
Parameters | ||
|
@@ -386,7 +418,10 @@ | |
if self.fft: | ||
self._conclude_fft() | ||
else: | ||
self._conclude_simple() | ||
if self.is_linear([ts.time for ts in self._trajectory]): | ||
self._conclude_simple() | ||
else: | ||
self._conclude_modified() # New modified code | ||
|
||
def _conclude_simple(self): | ||
r"""Calculates the MSD via the simple "windowed" algorithm.""" | ||
|
@@ -421,3 +456,46 @@ | |
positions[:, n, :] | ||
) | ||
self.results.timeseries = self.results.msds_by_particle.mean(axis=1) | ||
|
||
|
||
def _conclude_modified(self): | ||
from collections import defaultdict | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. imports at top of file, import collections then use |
||
print("The Dump file has non-linear time spacing") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eventually, remove all |
||
|
||
dump_times = [ts.time for ts in self._trajectory] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This iterates through the whole trajectory and may already be expensive! Maybe unavoidable but keep it in mind. |
||
# print(dump_times) | ||
n_frames = len(dump_times) | ||
n_atoms = self.n_particles | ||
positions = np.zeros((n_frames, n_atoms, 3)) | ||
positions = self._position_array.astype(np.float64) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This overwrites |
||
|
||
msd_dict = defaultdict(list) # Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]} | ||
|
||
# Looping over all the frames as if the referenced gets shifted frame to frame | ||
for i in range(n_frames): | ||
for j in range(i + 1, n_frames): | ||
delta_t = dump_times[j] - dump_times[i] | ||
|
||
# Compute displacement and squared displacement | ||
disp = positions[j] - positions[i] | ||
squared_disp = np.sum(disp ** 2, axis=1) | ||
msd = np.mean(squared_disp) | ||
|
||
# Store MSD under corresponding Δt | ||
msd_dict[delta_t].append(msd) | ||
|
||
msd_dict[0] = [0] | ||
|
||
# Prepare averaged results | ||
delta_t_values = [] | ||
avg_msds = [] | ||
for delta_t in sorted(msd_dict.keys()): | ||
msd_list = msd_dict[delta_t] | ||
avg_msd = np.mean(msd_list) | ||
delta_t_values.append(delta_t) | ||
avg_msds.append(avg_msd) | ||
|
||
self.results.timeseries = delta_t_values | ||
self.results.msds_by_particle = avg_msds | ||
|
||
# self.results.timeseries = self.results.msds_by_particle.mean(axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to do this later once code has been settled on, but documents exist as module docstring, or API docstring (see the sections encompassed by """ below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Irfan said: The comments will not be in the final PR. The first line of the file is always
and then comes the standard comment block. The only history information (eg who modified the file or added something) that we keep inside the code is versionchanged/versionadded markup in the doc strings. Everything else is known to be available through the git history and the information in the PR/issue.
Could you please
This will make it a lot easier to discuss the intention of the PR separate from the implementation.
Information on the algorithm (basically what's important for a user to know) will go into the doc string in a Notes section (we follow the numpy doc string standard and see https://userguide.mdanalysis.org/stable/contributing_code.html#guidelines-for-docstrings for MDAnalysis specific guidelines on docs).