Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@
'chemrole',
]

## warn about invalid references (e.g. invalid class names)
nitpicky = True
nitpick_ignore = []

autosummary_generate = True
autosummary_imported_members = True
remove_from_toctrees = [
Expand Down
5 changes: 3 additions & 2 deletions docs/source/user_guide/adduct_detection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@ Adduct Detection

In mass spectrometry it is crucial to ionize analytes prior to detection, because they are accelerated and manipulated in electric fields, allowing their separation based on mass-to-charge ratio.
This happens by addition of protons in positive mode or loss of protons in negative mode. Other ions present in the buffer solution can ionize the analyte as well, e.g. sodium, potassium or formic acid.
Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is much higher.
Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is potentially higher.

Furthermore, analytes can loose functional groups during ionization, e.g. a neutral water loss.
Since the ionization happens after liquid chromatography, different adducts for an analyte have similar retention times.
Since the ionization happens after liquid chromatography, different adducts for an analyte have almost identical retention times.

.. image:: img/adduct_detection.png

Expand Down
15 changes: 10 additions & 5 deletions docs/source/user_guide/algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,17 @@ Many signal processing algorithms follow a similar pattern in OpenMS.

algorithm = NameOfTheAlgorithmClass()
exp = MSExperiment()

# populate exp, for example load from file
# ...

# run the algorithm on data
algorithm.filterExperiment(exp)

In many cases, the processing algorithms have a set of parameters that can be
adjusted. These are accessible through :py:meth:`~.Algorithm.getParameters()` and yield a
adjusted. These are accessible through ``Algorithm.getParameters()`` and yield a
:py:class:`~.Param` object (see `Parameter handling <parameter_handling.html>`_) which can
be manipulated. After changing parameters, one can use :py:meth:`~.Algorithm.setParameters()` to
be manipulated. After changing parameters, one can use ``Algorithm.setParameters()`` to
propagate the new parameters to the algorithm:

.. code-block:: output
Expand All @@ -24,15 +28,16 @@ propagate the new parameters to the algorithm:
algorithm.setParameters(param)

exp = MSExperiment()

# populate exp, for example load from file
# ...

algorithm.filterExperiment(exp)

Since they work on a single :py:class:`~.MSExperiment` object, little input is needed to
execute a filter directly on the data. Examples of filters that follow this
pattern are :py:class:`~.GaussFilter`, :py:class:`~.SavitzkyGolayFilter` as well as the spectral filters
:py:class:`~.BernNorm`, :py:class:`~.MarkerMower`, :py:class:`~.NLargest`, :py:class:`~.Normalizer`,
:py:class:`~.ParentPeakMower`, :py:class:`~.Scaler`, :py:class:`~.SpectraMerger`, :py:class:`~.SqrtMower`,
:py:class:`~.ThresholdMower`, :py:class:`~.WindowMower`.
:py:class:`~.NLargest`, :py:class:`~.Normalizer`, :py:class:`~.SpectraMerger`, :py:class:`~.ThresholdMower`, :py:class:`~.WindowMower`.

Using the same example file as before, we can execute a :py:class:`~.GaussFilter` on our test data as follows:

Expand Down
18 changes: 13 additions & 5 deletions docs/source/user_guide/centroiding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,18 @@ Let's zoom in on an isotopic pattern in profile mode and plot it.
plt.plot(
profile_spectra[0].get_peaks()[0], profile_spectra[0].get_peaks()[1]
) # plot the first spectrum

plt.show()

.. image:: img/profile_data.png

Because of the limited resolution of MS instruments m/z measurements are not of unlimited precision.
Consequently, peak shapes spreads in the m/z dimension and resemble a gaussian distribution.
Due to the limited resolution of mass spectrometry (MS) instruments, m/z measurements exhibit a certain spread
when multiple copies of a molecule are measured. Even with identical mass and charge, the copies are recorded with
slight deviations in the m/z dimension. Consequently, peak shapes in this dimension adopt a Gaussian-like distribution.
The number of copies correlates with the peak height (or rather peak volume).

A single peptide species, e.g. "DPFINAGER" at charge 2, typically consists of various molecular
entities that differ in the number of neutrons, leading to an isotopic distribution and resulting in multiple peaks.

Using the :py:class:`~.PeakPickerHiRes` algorithm, we can convert data from profile to centroided mode. Usually, not much information is lost
by storing only centroided data. Thus, many algorithms and tools assume that centroided data is provided.

Expand All @@ -55,8 +62,9 @@ by storing only centroided data. Thus, many algorithms and tools assume that cen
plt.stem(
centroided_spectra[0].get_peaks()[0], centroided_spectra[0].get_peaks()[1]
) # plot as vertical lines

plt.show()

.. image:: img/centroided_data.png

After centroiding, a single m/z value for every isotopic peak is retained. By plotting the centroided data as stem plot
we discover that (in addition to the isotopic peaks) some low intensity peaks (intensity at approx. 4k) were present in the profile data.
we discover that (in addition to the isotopic peaks) some low intensity peaks (intensity at approx. 4k units on the y-axis) were present in the profile data.
116 changes: 88 additions & 28 deletions docs/source/user_guide/charge_isotope_deconvolution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,65 +3,99 @@ Charge and Isotope Deconvolution

A single mass spectrum contains measurements of one or more analytes and the
m/z values recorded for these analytes. Most analytes produce multiple signals
in the mass spectrometer, due to the natural abundance of carbon :math:`13` (naturally
occurring at ca. :math:`1\%` frequency) and the large amount of carbon atoms in most
organic molecules, most analytes produce a so-called isotopic pattern with a
monoisotopic peak (all carbon are :chem:`^{12}C`) and a first isotopic peak (exactly one
carbon atom is a :chem:`^{13}C`), a second isotopic peak (exactly two atoms are :chem:`^{13}C`) etc.
Note that also other elements can contribute to the isotope pattern, see the
`chemistry section <chemistry.html>`_ for further details.
in the mass spectrometer, due to the natural abundance of heavy isotopes.
The most dominant isotope in proteins is carbon :math:`13` (naturally
occurring at ca. :math:`1.1\%` frequency). Other elements such as Hydrogen also have heavy isotopes, but
they contribute to a much lesser extend, since the heavy isotopes are very low abundant,
e.g. hydrogen :math:`2` (Deuterium), occurs at a frequency of only :math:`0.0156\%`.

All analytes produce a so-called isotopic pattern, consisting of a
monoisotopic peak and a first isotopic peak (exactly one
extra neutron in one of the atoms of the molecule), a second isotopic peak (exactly two extra neutrons) etc.
With higher mass, the monoisotopic peak will become dimishingly small, up to the point where it is not detectable
any more (although this is rarely the case with peptides and more problematic for whole proteins in Top-Down approaches).

By definition, the monoisotopic peak is the peak which contains only isotopes of the most abundant type.
For peptides and proteins, the constituent atoms are C,H,N,O,P,S, where incidentally, the
most abundant isotope is also the lightest isotope. Hence, for peptides and proteins, the monoisotopic peak is always
the lightest peak in an isotopic distribution.

See the `chemistry section <chemistry.html>`_ for further details on isotope abundances and how to compute isotope patterns.

In addition, each analyte may appear in more than one charge state and adduct
state, a singly charge analyte :chem:`[M +H]+` may be accompanied by a doubly
state, a singly charged analyte :chem:`[M +H]+` may be accompanied by a doubly
charged analyte :chem:`[M +2H]++` or a sodium adduct :chem:`[M +Na]+`. In the case of a
multiply charged peptide, the isotopic traces are spaced by ``PROTON_MASS /
multiply charged peptide, the isotopic traces are spaced by ``NEUTRON_MASS /
charge_state`` which is often close to :math:`0.5\ m/z` for doubly charged analytes,
:math:`0.33\ m/z` for triply charged analytes etc. Note: tryptic peptides often appear
at least doubly charged, while small molecules often carry a single charge but
can have adducts other than hydrogen.
either singly charged (when ionized with :term:`MALDI`), or doubly charged (when ionized with :term:`ESI`).
Higher charges are also possible, but usually connected to incomplete tryptic digestions with missed cleavages.
Small molecules in metabolomics often carry a single charge but can have adducts other than hydrogen.

Single Peak Example
*********************************

Let's compute the isotope distribution of the peptide ``DFPIANGER`` using the classes :py:class:`~.AASequence` and
:py:class:`~.EmpiricalFormula`. Then we use the :py:class:`~.Deisotoper` to find the monoisotopic peak:

.. code-block:: python
:linenos:

import pyopenms as oms

charge = 2
seq = oms.AASequence.fromString("DFPIANGER")
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))

## get isotopic distribution for two additional hydrogens (which carry the charge)
charge = 2
seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6))
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))

# Append isotopic distribution to spectrum
s = oms.MSSpectrum()
for iso in isotopes.getContainer():
iso.setMZ(iso.getMZ() / charge)
for iso in isotopes.getContainer(): # the container contains masses, not m/z!
iso.setMZ(iso.getMZ() / charge) # ... even though it's called '.getMZ()'
s.push_back(iso)
print("Isotope", iso.getMZ(), ":", iso.getIntensity())

# deisotope with 10 ppm mass tolerance
oms.Deisotoper.deisotopeAndSingleChargeDefault(s, 10, True)

for p in s:
print(p.getMZ(), p.getIntensity())
print("Mono peaks:", p.getMZ(), p.getIntensity())

which will print:


.. code-block:: output
:linenos:

[M+H]+ weight: 1018.495240604071
Isotope 509.75180710055 : 0.5680345296859741
Isotope 510.25348451945 : 0.3053518533706665
Isotope 510.75516193835 : 0.09806874394416809
Isotope 511.25683935725004 : 0.023309258744120598
Isotope 511.75851677615003 : 0.0044969217851758
Isotope 512.2601941950501 : 0.000738693168386817
Mono peaks: 1018.496337734329 0.5680345296859741


Note that the algorithm presented here as some heuristics built into it, such
as assuming that the isotopic peaks will decrease after the first isotopic
peak. This heuristic can be tuned by changing the parameter
``use_decreasing_model`` and ``start_intensity_check``. In this case, the
second isotopic peak is the highest in intensity and the
``start_intensity_check`` parameter needs to be set to 3.
peak. This heuristic can be tuned by setting the parameter
``use_decreasing_model`` to ``False``.
For more fine-grained control use ``start_intensity_check`` and leave ``use_decreasing_model = True`` (see :py:class:`~.Deisotoper` --> C++ documentation).
Let's look at a very heavy peptide, whose isotopic distribution is dominated by the first and second isotopic peak.

.. code-block:: python
:linenos:

charge = 4
seq = oms.AASequence.fromString("DFPIANGERDFPIANGERDFPIANGERDFPIANGER")
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))

charge = 4
seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(8))
print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))

# Append isotopic distribution to spectrum
s = oms.MSSpectrum()
Expand All @@ -73,9 +107,9 @@ second isotopic peak is the highest in intensity and the
min_charge = 1
min_isotopes = 2
max_isotopes = 10
use_decreasing_model = True
start_intensity_check = 3
oms.Deisotoper.deisotopeAndSingleCharge(
use_decreasing_model = True # ignores all intensities
start_intensity_check = 3 # here, the value does not matter, since we ignore intensities (see above)
oms.Deisotoper.deisotopeAndSingleCharge( ## a function with all parameters exposed
s,
10,
True,
Expand All @@ -90,10 +124,26 @@ second isotopic peak is the highest in intensity and the
use_decreasing_model,
start_intensity_check,
False,
True
)
for p in s:
print(p.getMZ(), p.getIntensity())
print("Mono peaks:", p.getMZ(), p.getIntensity())

.. code-block:: output
:linenos:

[M+H]+ weight: 4016.927437824572
Isotope 1004.9878653713499 : 0.10543462634086609
Isotope 1005.2387040808 : 0.22646738588809967
Isotope 1005.48954279025 : 0.25444599986076355
Isotope 1005.7403814996999 : 0.19825772941112518
Isotope 1005.9912202091499 : 0.12000058591365814
Isotope 1006.2420589185999 : 0.05997777357697487
Isotope 1006.49289762805 : 0.025713207200169563
Isotope 1006.7437363375 : 0.009702674113214016
Mono peaks: 4016.9296320850867 0.10543462634086609

This successfully recovers the monoisotopic peak, even though it is not the most abundant peak.

Full Spectral De-Isotoping
**************************
Expand All @@ -107,6 +157,7 @@ state:
:linenos:

from urllib.request import urlretrieve
import matplotlib.pyplot as plt

gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
urlretrieve(gh + "/src/data/BSA1.mzML", "BSA1.mzML")
Expand All @@ -130,6 +181,7 @@ state:
use_decreasing_model,
start_intensity_check,
False,
True
)

print(e[214].size())
Expand All @@ -147,7 +199,15 @@ state:
if p.getIntensity() > 0.25 * maxvalue:
print(p.getMZ(), p.getIntensity())


unpicked_peak_data = e[214].get_peaks()
plt.bar(unpicked_peak_data[0], unpicked_peak_data[1], snap=False)
plt.show()

picked_peak_data = s.get_peaks()
plt.bar(picked_peak_data[0], picked_peak_data[1], snap=False)
plt.show()


which produces the following output

.. code-block:: output
Expand All @@ -159,7 +219,7 @@ which produces the following output
974.4589691256419 3215808.75

As we can see, the algorithm has reduced :math:`140` peaks to :math:`41` deisotoped peaks. It
also has identified a molecule at :math:`974.45\ m/z` as the most intense peak in the
also has identified a molecule with a singly charged mass of :math:`974.45\ Da` as the most intense peak in the
data (base peak).

Visualization
Expand Down
2 changes: 2 additions & 0 deletions docs/source/user_guide/chemistry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ OpenMS has representations for various chemical concepts including molecular
formulas, isotopes, ribonucleotide and amino acid sequences as well as common
modifications of amino acids or ribonucleotides.

For an introduction to isotope patterns, see `Charge and Isotope Deconvolution <charge_isotope_deconvolution.html>`_.

Constants
---------

Expand Down
Loading