Description
I'm just passing along an anonymous report that the net charge of amino acids read in using some AMBER-related topologies appears to cause some confusion (apparently this came up in a peer review criticism). I don't know if we are just "working as intended here," if there might be a small casting/data type issue, or if we just need to improve the docs a little. They didn't provide me with a programmatic reproducer, but I did get a brief verbal description on the whiteboard. I've tried to reconstruct into a minimum viable reproducer below, where some charges look a bit more sensible in float32
/rounded/truncated format, though something may still be off in some cases.
This might normally appear to be a fairly minor matter, but in the context of certain machine learning workflows the net charge on the residues can be a problematic feature apparently.
import numpy as np
from MDAnalysisTests.datafiles import PRM
import MDAnalysis as mda
u = mda.Universe(PRM)
for res in u.residues:
manual_charge = 0
for atom in res.atoms:
manual_charge += np.float32(atom.charge)
print(res.resname, res.charge, manual_charge)
ALA 1.0000000558793545 0.9999998
GLU -0.9999999655410647 -1.0
PHE 3.888271749019623e-08 5.9604645e-08
HIE 1.3504177331924438e-08 5.9604645e-08
ARG 0.9999999335850589 1.0
TRP 2.0954757928848267e-08 5.9604645e-08
SER 1.862645149230957e-08 0.0
SER 1.862645149230957e-08 0.0
TYR 7.799826562404633e-09 0.0
MET 3.8533471524715424e-08 5.9604645e-08
VAL 4.0978193283081055e-08 5.9604645e-08
HIE 1.3504177331924438e-08 5.9604645e-08
TRP 2.0954757928848267e-08 5.9604645e-08
LYS -6.891787052154541e-08 -1.1920929e-07
Are we "ok" here, or do we need a small tweak somewhere?