Skip to content

the InterRDF_s class is significantly slower than running multiple InterRDF instances sequentially #5067

Open
@gitzhangch

Description

@gitzhangch

Expected behavior

Recently, I have been conducting electrolyte simulation-related work using Gromacs 2025.1. Since I frequently use Python, I did not use gmx rdf. For multiple RDF calculations, I use InterRDF_s to accelerate.

Actual behavior

When analyzing multiple RDF pairs on a GROMACS 2025.1 trajectory, the InterRDF_s class is significantly slower than running multiple InterRDF instances sequentially, despite its design for batch processing.
I conducted an RDF analysis on the same group of molecules. When verbose=True is set, InterRDF_s shows a speed of 3.32 it/s while InterRDF shows a speed of 141.48 it/s. With the help of AI, I used cProfile for function-level analysis. The result showed that numpy.histogram was called too frequently, which affected the speed.

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysis.analysis import rdf

u = mda.Universe("s4_pr.tpr", "s4_pr.trr")

li_sel = u.select_atoms("resname LI")
an_sel = u.select_atoms("resname AN")

rdf_calc = rdf.InterRDF(li_sel, an_sel,verbose=True)
rdf_calc.run(stop=50)

rdf_calcs = rdf.InterRDF_s(u, [[li_sel,an_sel]],verbose=True)
rdf_calcs.run(stop=50)

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)")
    2.9.0
  • Which version of Python (python -V)?
    python 3.11.11(numpy==2.2.0)
  • Which operating system?
    ubuntu22

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions