Skip to content

Feature non linear time averaged msd #5066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

gitsirsha
Copy link

@gitsirsha gitsirsha commented Jun 15, 2025

Fixes #5028

Changes made in this Pull Request:

  • msd.py script was modified to support time-averaged mean square displacement calculation for dump files with non-linear time spacings.

PR Checklist

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5066.org.readthedocs.build/en/5066/

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

Copy link

codecov bot commented Jun 15, 2025

Codecov Report

Attention: Patch coverage is 94.28571% with 2 lines in your changes missing coverage. Please review.

Project coverage is 86.14%. Comparing base (0dc5a1b) to head (f6e5ddf).

Files with missing lines Patch % Lines
package/MDAnalysis/analysis/msd.py 94.28% 1 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (0dc5a1b) and HEAD (f6e5ddf). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (0dc5a1b) HEAD (f6e5ddf)
2 1
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5066      +/-   ##
===========================================
- Coverage    93.62%   86.14%   -7.49%     
===========================================
  Files          177      177              
  Lines        22001    22036      +35     
  Branches      3114     3119       +5     
===========================================
- Hits         20599    18982    -1617     
- Misses         947     2604    +1657     
+ Partials       455      450       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@mrshirts mrshirts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments on the code.

# Modified by Sirsha Ganguly
#
# Modification:
# '_conclude_modified()' is the new function added to calculate time-averaged mean squared displacement of non-linear dumps

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest something more descriptive here - _conclude_nonlinear() instead of _conclude_modified()

if len(sequence) < 2:
return True
step = sequence[1] - sequence[0]
return all(sequence[i+1] - sequence[i] == step for i in range(len(sequence) - 1))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now, it's failing the test. My guess is that it's going down the wrong test path, because it's interpreting whatever data is being passed in by the test as not actually being linear (maybe something like 1.0000 gets interpreted at 0.999999). A couple of options here:

  1. Add a keyword like nonlinear that controls the logic, rather than determining the logic from the steps.
  2. Using something like assert_almost_equal so floating point errors don't cause the logic to give the wrong answer (i.e. that they are not equally spaced when they actually are).

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some initial questions before we get into the weeds. I'm keen for us to end up with a single method that works for both use cases if possible.

There are probably things we can do to make that happen, but I first went to get a good idea of where things are at in this current state.

@@ -1,3 +1,25 @@
######## Documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to do this later once code has been settled on, but documents exist as module docstring, or API docstring (see the sections encompassed by """ below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Irfan said: The comments will not be in the final PR. The first line of the file is always

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-

and then comes the standard comment block. The only history information (eg who modified the file or added something) that we keep inside the code is versionchanged/versionadded markup in the doc strings. Everything else is known to be available through the git history and the information in the PR/issue.

Could you please

  • remove the comments
  • copy them to the PR information

This will make it a lot easier to discuss the intention of the PR separate from the implementation.

Information on the algorithm (basically what's important for a user to know) will go into the doc string in a Notes section (we follow the numpy doc string standard and see https://userguide.mdanalysis.org/stable/contributing_code.html#guidelines-for-docstrings for MDAnalysis specific guidelines on docs).

@@ -1,3 +1,25 @@
######## Documentation
#
# Issue: The default '_conclude_simple()' function previously calculated time-averaged mean squared displacement considering linear timespacing between the frames even when a dump file with non-linear timestep gap is provided
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if these details were in the PR info instead - this is where a reviewer usually looks for them (and ultimately we will likely merge without this).

# Modification:
# '_conclude_modified()' is the new function added to calculate time-averaged mean squared displacement of non-linear dumps
# Note: This new function is generalized for non-linear dumps.
# Moreover, if you use this function to calculate for linear timedump this gives same result as '_conclude_simple()' which is implemented in the default version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this, it's really what we need to discuss the most here. Ultimately I think the end target is this PR will be to end up with just one method. Having alternate code paths adds to user difficulty and maintenance costs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just taking things as they are now (before we get into the weeds).Could you comment on performance? Do the two methods run at similar speeds with the same inputs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone uses the fft based method instead with non linear times? Does that fail?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. That's a good start for a discussion.

Initially it would really help to have the big picture up front, namely a description at a high level what you want to accomplish and how.

Additionally

  • Describe the algorithm that you're implementing.
  • How does it differ from the existing one (without FFT)?
  • Can you provide a rough performance comparison between "linear" and "nonlinear"? (I understand that applying the algorithm for equally spaced frames to a trajectory with variably spaced frames is incorrect but I want to get a sense of the necessary added computational complexity.)

I added other comments throughout but initially the big picture is more important because we are trying to figure out if there are ways to re-use code: the fewer code is duplicated, the lower the maintenance burden that we incur, and the easier it is to test and maintain correctness.

@@ -1,3 +1,25 @@
######## Documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Irfan said: The comments will not be in the final PR. The first line of the file is always

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-

and then comes the standard comment block. The only history information (eg who modified the file or added something) that we keep inside the code is versionchanged/versionadded markup in the doc strings. Everything else is known to be available through the git history and the information in the PR/issue.

Could you please

  • remove the comments
  • copy them to the PR information

This will make it a lot easier to discuss the intention of the PR separate from the implementation.

Information on the algorithm (basically what's important for a user to know) will go into the doc string in a Notes section (we follow the numpy doc string standard and see https://userguide.mdanalysis.org/stable/contributing_code.html#guidelines-for-docstrings for MDAnalysis specific guidelines on docs).

class EinsteinMSD(AnalysisBase):

@staticmethod
def is_linear(sequence): # This function is to check if the timesteps in the array is linear or not! If it's linear default MDA code will carry-out other wise the modified code carries!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of comments, write doc strings.



def _conclude_modified(self):
from collections import defaultdict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports at top of file,

import collections

then use collections.defaultdict


def _conclude_modified(self):
from collections import defaultdict
print("The Dump file has non-linear time spacing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, remove all print – if you want to show status messages, use the logger.

from collections import defaultdict
print("The Dump file has non-linear time spacing")

dump_times = [ts.time for ts in self._trajectory]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterates through the whole trajectory and may already be expensive!

Maybe unavoidable but keep it in mind.

n_frames = len(dump_times)
n_atoms = self.n_particles
positions = np.zeros((n_frames, n_atoms, 3))
positions = self._position_array.astype(np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overwrites positions. Delete the line above?? Not sure what the intent was here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for MDAnalysis packages dealing with non-linear time dump
4 participants