OpenMS · cbielow · Mar 5, 2025 · Feb 7, 2025 · Feb 7, 2025 · Feb 7, 2025
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -46,6 +46,10 @@
     'chemrole',
 ]
 
+## warn about invalid references (e.g. invalid class names)
+nitpicky = True              
+nitpick_ignore = []
+
 autosummary_generate = True
 autosummary_imported_members = True
 remove_from_toctrees = [

diff --git a/docs/source/user_guide/adduct_detection.rst b/docs/source/user_guide/adduct_detection.rst
@@ -3,9 +3,10 @@ Adduct Detection
 
 In mass spectrometry it is crucial to ionize analytes prior to detection, because they are accelerated and manipulated in electric fields, allowing their separation based on mass-to-charge ratio.
 This happens by addition of protons in positive mode or loss of protons in negative mode. Other ions present in the buffer solution can ionize the analyte as well, e.g. sodium, potassium or formic acid.
-Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is much higher.
+Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is potentially higher.
+
 Furthermore, analytes can loose functional groups during ionization, e.g. a neutral water loss.
-Since the ionization happens after liquid chromatography, different adducts for an analyte have similar retention times.
+Since the ionization happens after liquid chromatography, different adducts for an analyte have almost identical retention times.
 
 .. image:: img/adduct_detection.png
 

diff --git a/docs/source/user_guide/algorithms.rst b/docs/source/user_guide/algorithms.rst
@@ -7,13 +7,17 @@ Many signal processing algorithms follow a similar pattern in OpenMS.
 
   algorithm = NameOfTheAlgorithmClass()
   exp = MSExperiment()
+
   # populate exp, for example load from file
+  # ...
+
+  # run the algorithm on data
   algorithm.filterExperiment(exp)
 
 In many cases, the processing algorithms have a set of parameters that can be
-adjusted. These are accessible through :py:meth:`~.Algorithm.getParameters()` and yield a
+adjusted. These are accessible through ``Algorithm.getParameters()`` and yield a
 :py:class:`~.Param` object (see `Parameter handling <parameter_handling.html>`_) which can
-be manipulated. After changing parameters, one can use :py:meth:`~.Algorithm.setParameters()` to
+be manipulated. After changing parameters, one can use ``Algorithm.setParameters()`` to
 propagate the new parameters to the algorithm:
 
 .. code-block:: output
@@ -24,15 +28,16 @@ propagate the new parameters to the algorithm:
   algorithm.setParameters(param)
 
   exp = MSExperiment()
+
   # populate exp, for example load from file
+  # ...
+
   algorithm.filterExperiment(exp)
 
 Since they work on a single :py:class:`~.MSExperiment` object, little input is needed to
 execute a filter directly on the data. Examples of filters that follow this
 pattern are :py:class:`~.GaussFilter`, :py:class:`~.SavitzkyGolayFilter` as well as the spectral filters
-:py:class:`~.BernNorm`, :py:class:`~.MarkerMower`, :py:class:`~.NLargest`, :py:class:`~.Normalizer`,
-:py:class:`~.ParentPeakMower`, :py:class:`~.Scaler`, :py:class:`~.SpectraMerger`, :py:class:`~.SqrtMower`,
-:py:class:`~.ThresholdMower`, :py:class:`~.WindowMower`.
+:py:class:`~.NLargest`, :py:class:`~.Normalizer`, :py:class:`~.SpectraMerger`, :py:class:`~.ThresholdMower`, :py:class:`~.WindowMower`.
 
 Using the same example file as before, we can execute a :py:class:`~.GaussFilter` on our test data as follows:
 

diff --git a/docs/source/user_guide/centroiding.rst b/docs/source/user_guide/centroiding.rst
@@ -33,11 +33,18 @@ Let's zoom in on an isotopic pattern in profile mode and plot it.
     plt.plot(
         profile_spectra[0].get_peaks()[0], profile_spectra[0].get_peaks()[1]
     )  # plot the first spectrum
-
+    plt.show()
+
 .. image:: img/profile_data.png
 
-Because of the limited resolution of MS instruments m/z measurements are not of unlimited precision.
-Consequently, peak  shapes spreads in the m/z dimension and resemble a gaussian distribution.
+Due to the limited resolution of mass spectrometry (MS) instruments, m/z measurements exhibit a certain spread
+when multiple copies of a molecule are measured. Even with identical mass and charge, the copies are recorded with 
+slight deviations in the m/z dimension. Consequently, peak shapes in this dimension adopt a Gaussian-like distribution.
+The number of copies correlates with the peak height (or rather peak volume).
+
+A single peptide species, e.g. "DPFINAGER" at charge 2, typically consists of various molecular
+entities that differ in the number of neutrons, leading to an isotopic distribution and resulting in multiple peaks.
+
 Using the :py:class:`~.PeakPickerHiRes` algorithm, we can convert data from profile to centroided mode. Usually, not much information is lost
 by storing only centroided data. Thus, many algorithms and tools assume that centroided data is provided.
 
@@ -55,8 +62,9 @@ by storing only centroided data. Thus, many algorithms and tools assume that cen
     plt.stem(
         centroided_spectra[0].get_peaks()[0], centroided_spectra[0].get_peaks()[1]
     )  # plot as vertical lines
-
+    plt.show()
+
 .. image:: img/centroided_data.png
 
 After centroiding, a single m/z value for every isotopic peak is retained. By plotting the centroided data as stem plot
-we discover that (in addition to the isotopic peaks) some low intensity peaks (intensity at approx. 4k) were present in the profile data.
+we discover that (in addition to the isotopic peaks) some low intensity peaks (intensity at approx. 4k units on the y-axis) were present in the profile data.
diff --git a/docs/source/user_guide/charge_isotope_deconvolution.rst b/docs/source/user_guide/charge_isotope_deconvolution.rst
@@ -3,65 +3,99 @@ Charge and Isotope Deconvolution
 
 A single mass spectrum contains measurements of one or more analytes and the
 m/z values recorded for these analytes. Most analytes produce multiple signals
-in the mass spectrometer, due to the natural abundance of carbon :math:`13` (naturally
-occurring at ca. :math:`1\%` frequency) and the large amount of carbon atoms in most
-organic molecules, most analytes produce a so-called isotopic pattern with a
-monoisotopic peak (all carbon are :chem:`^{12}C`) and a first isotopic peak (exactly one
-carbon atom is a :chem:`^{13}C`), a second isotopic peak (exactly two atoms are :chem:`^{13}C`) etc.
-Note that also other elements can contribute to the isotope pattern, see the 
-`chemistry section <chemistry.html>`_ for further details.
+in the mass spectrometer, due to the natural abundance of heavy isotopes.
+The most dominant isotope in proteins is carbon :math:`13` (naturally
+occurring at ca. :math:`1.1\%` frequency). Other elements such as Hydrogen also have heavy isotopes, but
+they contribute to a much lesser extend, since the heavy isotopes are very low abundant, 
+e.g. hydrogen :math:`2` (Deuterium), occurs at a frequency of only :math:`0.0156\%`.
+
+All analytes produce a so-called isotopic pattern, consisting of a
+monoisotopic peak and a first isotopic peak (exactly one
+extra neutron in one of the atoms of the molecule), a second isotopic peak (exactly two extra neutrons) etc.
+With higher mass, the monoisotopic peak will become dimishingly small, up to the point where it is not detectable
+any more (although this is rarely the case with peptides and more problematic for whole proteins in Top-Down approaches).
+
+By definition, the monoisotopic peak is the peak which contains only isotopes of the most abundant type.
+For peptides and proteins, the constituent atoms are C,H,N,O,P,S, where incidentally, the
+most abundant isotope is also the lightest isotope. Hence, for peptides and proteins, the monoisotopic peak is always 
+the lightest peak in an isotopic distribution.
+
+See the `chemistry section <chemistry.html>`_ for further details on isotope abundances and how to compute isotope patterns.
 
 In addition, each analyte may appear in more than one charge state and adduct
-state, a singly charge analyte :chem:`[M +H]+` may be accompanied by a doubly
+state, a singly charged analyte :chem:`[M +H]+` may be accompanied by a doubly
 charged analyte :chem:`[M +2H]++` or a sodium adduct :chem:`[M +Na]+`. In the case of a
-multiply charged peptide, the isotopic traces are spaced by ``PROTON_MASS /
+multiply charged peptide, the isotopic traces are spaced by ``NEUTRON_MASS /
 charge_state`` which is often close to :math:`0.5\ m/z` for doubly charged analytes,
 :math:`0.33\ m/z` for triply charged analytes etc. Note: tryptic peptides often appear
-at least doubly charged, while small molecules often carry a single charge but
-can have adducts other than hydrogen.
+either singly charged (when ionized with  :term:`MALDI`), or doubly charged (when ionized with  :term:`ESI`).
+Higher charges are also possible, but usually connected to incomplete tryptic digestions with missed cleavages.
+Small molecules in metabolomics often carry a single charge but can have adducts other than hydrogen.
 
 Single Peak Example
 *********************************
 
+Let's compute the isotope distribution of the peptide ``DFPIANGER`` using the classes :py:class:`~.AASequence` and 
+:py:class:`~.EmpiricalFormula`. Then we use the :py:class:`~.Deisotoper` to find the monoisotopic peak:
+
 .. code-block:: python
     :linenos:
 
     import pyopenms as oms
 
-    charge = 2
     seq = oms.AASequence.fromString("DFPIANGER")
+    print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
+
+    ## get isotopic distribution for two additional hydrogens (which carry the charge)
+    charge = 2
     seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
     isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6))
-    print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
 
     # Append isotopic distribution to spectrum
     s = oms.MSSpectrum()
-    for iso in isotopes.getContainer():
-        iso.setMZ(iso.getMZ() / charge)
+    for iso in isotopes.getContainer():  # the container contains masses, not m/z!
+        iso.setMZ(iso.getMZ() / charge) #  ... even though it's called '.getMZ()'
         s.push_back(iso)
         print("Isotope", iso.getMZ(), ":", iso.getIntensity())
 
+    # deisotope with 10 ppm mass tolerance
     oms.Deisotoper.deisotopeAndSingleChargeDefault(s, 10, True)
 
     for p in s:
-        print(p.getMZ(), p.getIntensity())
+        print("Mono peaks:", p.getMZ(), p.getIntensity())
+
+which will print:
+
+
+.. code-block:: output
+    :linenos:
+
+    [M+H]+ weight: 1018.495240604071
+    Isotope 509.75180710055 : 0.5680345296859741
+    Isotope 510.25348451945 : 0.3053518533706665
+    Isotope 510.75516193835 : 0.09806874394416809
+    Isotope 511.25683935725004 : 0.023309258744120598
+    Isotope 511.75851677615003 : 0.0044969217851758
+    Isotope 512.2601941950501 : 0.000738693168386817
+    Mono peaks: 1018.496337734329 0.5680345296859741
 
 
 Note that the algorithm presented here as some heuristics built into it, such
 as assuming that the isotopic peaks will decrease after the first isotopic
-peak. This heuristic can be tuned by changing the parameter
-``use_decreasing_model`` and ``start_intensity_check``. In this case, the
-second isotopic peak  is the highest in intensity and the
-``start_intensity_check`` parameter needs to be set to 3. 
+peak. This heuristic can be tuned by setting the parameter
+``use_decreasing_model`` to ``False``.
+For more fine-grained control use ``start_intensity_check`` and leave ``use_decreasing_model = True`` (see :py:class:`~.Deisotoper` --> C++ documentation).
+Let's look at a very heavy peptide, whose isotopic distribution is dominated by the first and second isotopic peak.
 
 .. code-block:: python
     :linenos:
 
-    charge = 4
     seq = oms.AASequence.fromString("DFPIANGERDFPIANGERDFPIANGERDFPIANGER")
+    print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
+
+    charge = 4
     seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
     isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(8))
-    print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
 
     # Append isotopic distribution to spectrum
     s = oms.MSSpectrum()
@@ -73,9 +107,9 @@ second isotopic peak  is the highest in intensity and the
     min_charge = 1
     min_isotopes = 2
     max_isotopes = 10
-    use_decreasing_model = True
-    start_intensity_check = 3
-    oms.Deisotoper.deisotopeAndSingleCharge(
+    use_decreasing_model = True   # ignores all intensities
+    start_intensity_check = 3     # here, the value does not matter, since we ignore intensities (see above)
+    oms.Deisotoper.deisotopeAndSingleCharge( ## a function with all parameters exposed
         s,
         10,
         True,
@@ -90,10 +124,26 @@ second isotopic peak  is the highest in intensity and the
         use_decreasing_model,
         start_intensity_check,
         False,
+        True
     )
     for p in s:
-        print(p.getMZ(), p.getIntensity())
+        print("Mono peaks:", p.getMZ(), p.getIntensity())
 
+.. code-block:: output
+    :linenos:
+
+    [M+H]+ weight: 4016.927437824572
+    Isotope 1004.9878653713499 : 0.10543462634086609
+    Isotope 1005.2387040808 : 0.22646738588809967
+    Isotope 1005.48954279025 : 0.25444599986076355
+    Isotope 1005.7403814996999 : 0.19825772941112518
+    Isotope 1005.9912202091499 : 0.12000058591365814
+    Isotope 1006.2420589185999 : 0.05997777357697487
+    Isotope 1006.49289762805 : 0.025713207200169563
+    Isotope 1006.7437363375 : 0.009702674113214016
+    Mono peaks: 4016.9296320850867 0.10543462634086609
+
+This successfully recovers the monoisotopic peak, even though it is not the most abundant peak.
 
 Full Spectral De-Isotoping
 **************************
@@ -107,6 +157,7 @@ state:
     :linenos:
 
     from urllib.request import urlretrieve
+    import matplotlib.pyplot as plt
 
     gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
     urlretrieve(gh + "/src/data/BSA1.mzML", "BSA1.mzML")
@@ -130,6 +181,7 @@ state:
         use_decreasing_model,
         start_intensity_check,
         False,
+        True
     )
 
     print(e[214].size())
@@ -147,7 +199,15 @@ state:
         if p.getIntensity() > 0.25 * maxvalue:
             print(p.getMZ(), p.getIntensity())
 
-
+    unpicked_peak_data = e[214].get_peaks()
+    plt.bar(unpicked_peak_data[0], unpicked_peak_data[1], snap=False)
+    plt.show()
+
+    picked_peak_data = s.get_peaks()
+    plt.bar(picked_peak_data[0], picked_peak_data[1], snap=False)
+    plt.show()
+
+
 which produces the following output
 
 .. code-block:: output
@@ -159,7 +219,7 @@ which produces the following output
   974.4589691256419 3215808.75
 
 As we can see, the algorithm has reduced :math:`140` peaks to :math:`41` deisotoped peaks. It
-also has identified a molecule at :math:`974.45\ m/z` as the most intense peak in the
+also has identified a molecule with a singly charged mass of  :math:`974.45\ Da` as the most intense peak in the
 data (base peak).
 
 Visualization

diff --git a/docs/source/user_guide/chemistry.rst b/docs/source/user_guide/chemistry.rst
@@ -5,6 +5,8 @@ OpenMS has representations for various chemical concepts including molecular
 formulas, isotopes, ribonucleotide and amino acid sequences as well as common
 modifications of amino acids or ribonucleotides.
 
+For an introduction to isotope patterns, see `Charge and Isotope Deconvolution <charge_isotope_deconvolution.html>`_.
+
 Constants
 ---------