Skip to content

Feature non linear time averaged msd #5066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion package/AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ Chronological list of authors
- Abdulrahman Elbanna
- Tulga-Erdene Sodjargal
- Gareth Elliott

- Sirsha Ganguly

External code
-------------
Expand Down
80 changes: 79 additions & 1 deletion package/MDAnalysis/analysis/msd.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
######## Documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to do this later once code has been settled on, but documents exist as module docstring, or API docstring (see the sections encompassed by """ below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Irfan said: The comments will not be in the final PR. The first line of the file is always

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-

and then comes the standard comment block. The only history information (eg who modified the file or added something) that we keep inside the code is versionchanged/versionadded markup in the doc strings. Everything else is known to be available through the git history and the information in the PR/issue.

Could you please

  • remove the comments
  • copy them to the PR information

This will make it a lot easier to discuss the intention of the PR separate from the implementation.

Information on the algorithm (basically what's important for a user to know) will go into the doc string in a Notes section (we follow the numpy doc string standard and see https://userguide.mdanalysis.org/stable/contributing_code.html#guidelines-for-docstrings for MDAnalysis specific guidelines on docs).

#
# Issue: The default '_conclude_simple()' function previously calculated time-averaged mean squared displacement considering linear timespacing between the frames even when a dump file with non-linear timestep gap is provided
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if these details were in the PR info instead - this is where a reviewer usually looks for them (and ultimately we will likely merge without this).

#
# Modified by Sirsha Ganguly
#
# Modification:
# '_conclude_modified()' is the new function added to calculate time-averaged mean squared displacement of non-linear dumps

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest something more descriptive here - _conclude_nonlinear() instead of _conclude_modified()

# Note: This new function is generalized for non-linear dumps.
# Moreover, if you use this function to calculate for linear timedump this gives same result as '_conclude_simple()' which is implemented in the default version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this, it's really what we need to discuss the most here. Ultimately I think the end target is this PR will be to end up with just one method. Having alternate code paths adds to user difficulty and maintenance costs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just taking things as they are now (before we get into the weeds).Could you comment on performance? Do the two methods run at similar speeds with the same inputs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone uses the fft based method instead with non linear times? Does that fail?

# So, I believe the algorithm in '_conclude_modified()' can potentially be used to calculate time-averaged msd in future.
# For now it only executes when the dump is non-linear. [ This is determined by the added function is_linear() ]
# It is tested with a .dump file of a Coarse Grained system of single type
#
# General Algorithm of '_conclude_modified()' to calculate time-averaged msd:
# It creates a dictionary/hash-map with the key being the time difference between the frames (Delta t) and value being a list containing the frame to frame MSD values [msd1, msd2, .....] for that particular Delta t
# Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}
# The reference frame is changed with each iteration and each time a new Key (i.e, Delta t) is found it is appended in the dictionary/hash-map
# If a duplicate Delta_t is found it is added to the value (msd list) of that specefic Key in the dictionary/hash-map
# Lastly in the dictionary/hash-map for each Delta_t the msd values inside the list are averaged
#

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
#
Expand Down Expand Up @@ -263,7 +285,17 @@
del Doi



class EinsteinMSD(AnalysisBase):

@staticmethod
def is_linear(sequence): # This function is to check if the timesteps in the array is linear or not! If it's linear default MDA code will carry-out other wise the modified code carries!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of comments, write doc strings.

if len(sequence) < 2:
return True

Check warning on line 294 in package/MDAnalysis/analysis/msd.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/analysis/msd.py#L294

Added line #L294 was not covered by tests
step = sequence[1] - sequence[0]
return all(sequence[i+1] - sequence[i] == step for i in range(len(sequence) - 1))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now, it's failing the test. My guess is that it's going down the wrong test path, because it's interpreting whatever data is being passed in by the test as not actually being linear (maybe something like 1.0000 gets interpreted at 0.999999). A couple of options here:

  1. Add a keyword like nonlinear that controls the logic, rather than determining the logic from the steps.
  2. Using something like assert_almost_equal so floating point errors don't cause the logic to give the wrong answer (i.e. that they are not equally spaced when they actually are).



r"""Class to calculate Mean Squared Displacement by the Einstein relation.

Parameters
Expand Down Expand Up @@ -386,7 +418,10 @@
if self.fft:
self._conclude_fft()
else:
self._conclude_simple()
if self.is_linear([ts.time for ts in self._trajectory]):
self._conclude_simple()
else:
self._conclude_modified() # New modified code

def _conclude_simple(self):
r"""Calculates the MSD via the simple "windowed" algorithm."""
Expand Down Expand Up @@ -421,3 +456,46 @@
positions[:, n, :]
)
self.results.timeseries = self.results.msds_by_particle.mean(axis=1)


def _conclude_modified(self):
from collections import defaultdict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports at top of file,

import collections

then use collections.defaultdict

print("The Dump file has non-linear time spacing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, remove all print – if you want to show status messages, use the logger.


dump_times = [ts.time for ts in self._trajectory]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterates through the whole trajectory and may already be expensive!

Maybe unavoidable but keep it in mind.

# print(dump_times)
n_frames = len(dump_times)
n_atoms = self.n_particles
positions = np.zeros((n_frames, n_atoms, 3))
positions = self._position_array.astype(np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overwrites positions. Delete the line above?? Not sure what the intent was here.


msd_dict = defaultdict(list) # Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}

# Looping over all the frames as if the referenced gets shifted frame to frame
for i in range(n_frames):
for j in range(i + 1, n_frames):
delta_t = dump_times[j] - dump_times[i]

# Compute displacement and squared displacement
disp = positions[j] - positions[i]
squared_disp = np.sum(disp ** 2, axis=1)
msd = np.mean(squared_disp)

# Store MSD under corresponding Δt
msd_dict[delta_t].append(msd)

msd_dict[0] = [0]

# Prepare averaged results
delta_t_values = []
avg_msds = []
for delta_t in sorted(msd_dict.keys()):
msd_list = msd_dict[delta_t]
avg_msd = np.mean(msd_list)
delta_t_values.append(delta_t)
avg_msds.append(avg_msd)

self.results.timeseries = delta_t_values
self.results.msds_by_particle = avg_msds

# self.results.timeseries = self.results.msds_by_particle.mean(axis=1)
Loading