From 750a0001ef761b77bc86b7644562d708c5e11665 Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Fri, 7 Feb 2025 16:45:32 +0100 Subject: [PATCH 01/11] * efficient loading using PeakFileOptions --- docs/source/user_guide/ms_data.rst | 41 +++++++++++++++++++++++++++--- 1 file changed, 38 insertions(+), 3 deletions(-) diff --git a/docs/source/user_guide/ms_data.rst b/docs/source/user_guide/ms_data.rst index 6cae38d26..e5a6b80aa 100644 --- a/docs/source/user_guide/ms_data.rst +++ b/docs/source/user_guide/ms_data.rst @@ -651,7 +651,25 @@ mass spectra that are not :term:`MS1` spectra if s.getMSLevel() > 1: filtered.addSpectrum(s) - # filtered now only contains spectra with MS level > 2 + # 'filtered' now only contains spectra with MS level >= 2 + +Alternatively, we can chose to load only spectra of a certain level using :py:class:`~.PeakFileOptions`, which is even more efficient. + +.. code-block:: python + :linenos: + + # Create a PeakFileOptions object + options = oms.PeakFileOptions() + options.setMSLevels([2]) # Load only MS level 2 + + # Load the mzML file with the specified options + mzml = oms.MzMLFile() + mzml.setOptions(options) # Apply the options + mzml.load("test.mzML", filtered) + + # 'filtered' now only contains spectra with MS level == 2 + +# Now exp contains only MS level 2 spectra Filtering by Scan Number @@ -695,13 +713,30 @@ We can easily filter our data accordingly: filtered.addSpectrum(s) # filtered only contains only fragment spectra with peaks in range [mz_start, mz_end] + +For this simple example, you can achieve the same thing using :py:class:`~.PeakFileOptions` when loading the data: + +.. code-block:: python + :linenos: + + # Create a PeakFileOptions object + options = oms.PeakFileOptions() + options.setMSLevels([2]) # Load only MS level 2 + options.setMZRange(oms.DRange1(oms.DPosition1(mz_start),oms.DPosition1(mz_end))) + + # Load the mzML file with the specified options + mzml = oms.MzMLFile() + mzml.setOptions(options) # Apply the options + mzml.load("test.mzML", filtered) + + # 'filtered' now only contains spectra with MS level == 2, and each spectrum has peaks with m/z values between 6-12 Note that in a real-world application, we would set the ``mz_start`` and ``mz_end`` parameter to an actual area of interest, for example the area between 125 and 132 which contains quantitative ions for a :term:`TMT` experiment. -Similarly we could only retain peaks above a certain -intensity or keep only the top N peaks in each mass spectrum. +Similarly we could only retain spectra with a certain retention time or peaks with a certain intensity range. +See :py:class:`~.PeakFileOptions` for details. For more advanced filtering tasks pyOpenMS provides special algorithm classes. We will take a closer look at some of them in the next section. From db49eb9aa2dee1fe8839ea8969fda26f3f795c83 Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Fri, 7 Feb 2025 16:45:54 +0100 Subject: [PATCH 02/11] fix broken links --- docs/source/user_guide/algorithms.rst | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/source/user_guide/algorithms.rst b/docs/source/user_guide/algorithms.rst index 23d50778f..95c5d28e4 100644 --- a/docs/source/user_guide/algorithms.rst +++ b/docs/source/user_guide/algorithms.rst @@ -7,13 +7,17 @@ Many signal processing algorithms follow a similar pattern in OpenMS. algorithm = NameOfTheAlgorithmClass() exp = MSExperiment() + # populate exp, for example load from file + # ... + + # run the algorithm on data algorithm.filterExperiment(exp) In many cases, the processing algorithms have a set of parameters that can be -adjusted. These are accessible through :py:meth:`~.Algorithm.getParameters()` and yield a +adjusted. These are accessible through ``Algorithm.getParameters()`` and yield a :py:class:`~.Param` object (see `Parameter handling `_) which can -be manipulated. After changing parameters, one can use :py:meth:`~.Algorithm.setParameters()` to +be manipulated. After changing parameters, one can use ``Algorithm.setParameters()`` to propagate the new parameters to the algorithm: .. code-block:: output @@ -30,9 +34,7 @@ propagate the new parameters to the algorithm: Since they work on a single :py:class:`~.MSExperiment` object, little input is needed to execute a filter directly on the data. Examples of filters that follow this pattern are :py:class:`~.GaussFilter`, :py:class:`~.SavitzkyGolayFilter` as well as the spectral filters -:py:class:`~.BernNorm`, :py:class:`~.MarkerMower`, :py:class:`~.NLargest`, :py:class:`~.Normalizer`, -:py:class:`~.ParentPeakMower`, :py:class:`~.Scaler`, :py:class:`~.SpectraMerger`, :py:class:`~.SqrtMower`, -:py:class:`~.ThresholdMower`, :py:class:`~.WindowMower`. +:py:class:`~.NLargest`, :py:class:`~.Normalizer`, :py:class:`~.SpectraMerger`, :py:class:`~.ThresholdMower`, :py:class:`~.WindowMower`. Using the same example file as before, we can execute a :py:class:`~.GaussFilter` on our test data as follows: From bead0e0c6339aa133c61c016c0f73b3ecaca9bfa Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Fri, 7 Feb 2025 16:46:37 +0100 Subject: [PATCH 03/11] extended Param handling tutorial (restrictions and value types) --- docs/source/user_guide/parameter_handling.rst | 100 +++++++++++++++--- 1 file changed, 83 insertions(+), 17 deletions(-) diff --git a/docs/source/user_guide/parameter_handling.rst b/docs/source/user_guide/parameter_handling.rst index fb0963790..f6aba61b6 100644 --- a/docs/source/user_guide/parameter_handling.rst +++ b/docs/source/user_guide/parameter_handling.rst @@ -3,13 +3,17 @@ Parameter Handling Parameter handling in OpenMS and pyOpenMS is usually implemented through inheritance from :py:class:`~.DefaultParamHandler` and allow access to parameters through the :py:class:`~.Param` object. This -means, the classes implement the methods ``getDefaults``, ``getParameters``, ``setParameters`` -which allows access to the default parameters, the current parameters and allows to set the -parameters. +means, the classes implement the methods ``getDefaults``, ``getParameters`` and ``setParameters``, +to access to the default parameters, the current parameters and to set new parameters, respectively. +The class :py:class:`~.TheoreticalSpectrumGenerator` is just one example of many which makes use of parameter handling via +:py:class:`~.DefaultParamHandler`. -The :py:class:`~.Param` object that is returned can be manipulated through the :py:meth:`~.Param.setValue` -and :py:meth:`~.Param.getValue` methods (the ``exists`` method can be used to check for existence of a key). Using the -:py:meth:`~.Param.getDescription` method, it is possible to get a help-text for each parameter value in an +The :py:class:`~.Param` object is the central data structure here. It can be manipulated through the :py:meth:`~.Param.setValue` +and :py:meth:`~.Param.getValue` methods. The :py:meth:`~.Param.exists` method can be used to check for existence of a key and should +always be used if a param value might be missing, since accessing a missing value via :py:meth:`~.Param.getValue` +will result in a RuntimeError exception. + +Using the :py:meth:`~.Param.getDescription` method, it is possible to get a descriptive help for each parameter value in an interactive session without consulting the documentation. .. code-block:: python @@ -31,7 +35,7 @@ interactive session without consulting the documentation. print(p[b"param3"]) -The parameters can then be accessed as +The parameters can also be accessed as .. code-block:: pycon @@ -47,7 +51,7 @@ The parameters can then be accessed as True -The param object can be copy and merge in to other param object as +The param object can be copied and merged into other param object: .. code-block:: python :linenos: @@ -61,9 +65,7 @@ The param object can be copy and merge in to other param object as print("no data available") - new_p = oms.Param() - if p.empty() == False: # check p is not empty - new_p = p # new deep copy of p generate with name "new_p" + new_p = p # new deep copy of p # we will add 4 more keys to the new_p new_p.setValue("param2", 9.0, "This is value 9") @@ -73,17 +75,17 @@ The param object can be copy and merge in to other param object as # names "example1", "example2" , "example3" keys will added to p, but "param2" will update the value p.merge(new_p) - print(" print the key and values pairs stored in a Param object p ") + print(" print the key and values pairs stored in a Param object p ") printParamKeyAndValues(p) -In param object the keys values can be remove by key_name or prefix as +In a param object, the keys can be removed by key name or prefix: .. code-block:: python :linenos: - # We now call the remove method with key of the entry we want to delete ("example3") + # We now call the remove method with the key of the entry we want to delete ("example3") new_p.remove("example3") - print("Key and values pairs after removing the entry with key: example3") + print("Key and value pairs after removing the entry with key: example3") printParamKeyAndValues(new_p) # We now want to delete all keys with prefix "exam" @@ -100,12 +102,13 @@ In param object the keys values can be remove by key_name or prefix as print("Keys and values after deleting all entries.") printParamKeyAndValues(new_p) # All keys of new_p deleted -For the algorithms that inherit :py:class:`~.DefaultParamHandler`, the users can list all parameters along with their descriptions by using, for instance, the following simple function. +For the algorithms that inherit from :py:class:`~.DefaultParamHandler`, you can list all parameters along with their +description by using, for instance, the following simple function. .. code-block:: python :linenos: - # print all parameters + # print all parameters with description def printParams(p): if p.size(): for i in p.keys(): @@ -126,3 +129,66 @@ For the algorithms that inherit :py:class:`~.DefaultParamHandler`, the users can The higher the value, the wider the peak and therefore the wider the gaussian. Param: b'use_ppm_tolerance' Value: false Description: If true, instead of the gaussian_width value, the ppm_tolerance is used. The gaussian is calculated in each step anew, so this is much slower. Param: b'write_log_messages' Value: false Description: true: Warn if no signal was found by the Gauss filter algorithm. + +To print a simple key-value list, you can use ``asDict()``, as shown above: + +.. code-block:: python + :linenos: + + gf = oms.GaussFilter() + gf.getParameters().asDict() + + +Types of Parameter Values +************************************************ + +A :py:class:`~.Param` object can hold many parameters of mixed value type. Above, we have seen floating point values, e.g. + +.. code-block:: python + :linenos: + + new_p.setValue("param2", 9.0, "This is value 9") + +Other possible values include ``int``, ``float``, ``bytes``, ``str``, ``List[int]``, ``List[float]``, ``List[bytes]`` (aka StringList). +E.g. + +.. code-block:: python + :linenos: + + p = oms.Param() + p.setValue("p_float", 4.0, "This is a float") + p.setValue("p_int", 5, "This is an integer") + p.setValue("p_string", "myvalue", "This is a string") + p.setValue("p_stringlist", [b"H:+:0.6", b"Na:+:0.2", b"K:+:0.2"], "This is a StringList") + p.setValue("p_floatlist", [1.0, 2.0, 3.0], "This is a list of floats") + p.setValue("p_intlist", [1, 2, 3], "This is a list of integers") + + +Restrictions(=Validity) of Parameter Values +******************************************************* + +For certain types of values, pyOpenMS supports restrictions, +e.g. for single strings only a restricted set of values may be allowed. +Also, for floats/ints only a restricted interval of numbers may be valid. + +Usually, these restrictions are set by the OpenMS algorithm/class which hands out the parameters. +Then, if you provide invalid values via ``setParameters``, the algorithm will throw an exception. + +In theory, you can create your own restrictions. Usually this is done when defining the algorithm in C++ and is out of scope here. + +E.g. + +.. code-block:: python + :linenos: + + gf = oms.GaussFilter() + gfp = gf.getParameters() + gfp.getValidStrings(b"use_ppm_tolerance") ## yields [b'true', b'false'] + + gfp.setValue(b"use_ppm_tolerance", "maybe") ## does not do anything ... + ## ... until you actually set the parameters: + gf.setParameters(gfp) ## --> throws a RuntimeError GaussFilter: Invalid string parameter value 'maybe' for parameter 'use_ppm_tolerance' given! Valid values are: 'true,false'. + + + + From 0f19d9517d0c5f9998c1617bc26ab04e487b588c Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Fri, 7 Feb 2025 16:47:14 +0100 Subject: [PATCH 04/11] fixes to Feature tutorial --- docs/source/user_guide/quantitative_data.rst | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/source/user_guide/quantitative_data.rst b/docs/source/user_guide/quantitative_data.rst index a8fcee2f6..671eced1e 100644 --- a/docs/source/user_guide/quantitative_data.rst +++ b/docs/source/user_guide/quantitative_data.rst @@ -1,7 +1,7 @@ Quantitative Data ================= -features +Features ************************** In OpenMS, information about quantitative data is stored in a so-called @@ -42,14 +42,15 @@ features can be stored in a :py:class:`~.FeatureMap` and written to disk. fm.push_back(feature) oms.FeatureXMLFile().store("test.featureXML", fm) -Visualizing the resulting map in :term:`TOPPView` allows detection of the two -features stored in the :py:class:`~.FeatureMap` with the visualization indicating charge -state, m/z, RT and other properties: +Opening the resulting feature map in :term:`TOPPView` allows to visualize the two +features (each represented by a black dot in the top right and bottom left, respectively). +Hovering over a feature displays m/z, RT and other properties: .. image:: img/feature.png -Note that in this case only two features are present, but in a typical :term:`LC-MS/MS` -experiments, thousands of features are present. +In the above example, only two features are present. In a typical :term:`LC-MS/MS` +experiment, you can expect thousands of features. + :term:`Feature Maps` From f82d4e3b1b66c0f4c9deff0a9e6060386d1d3409 Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Fri, 7 Feb 2025 16:47:50 +0100 Subject: [PATCH 05/11] add nitpicky mode to Sphinx to detect broken links (outdated classes, typos etc) --- docs/source/conf.py | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/conf.py b/docs/source/conf.py index 0c8b4a5ce..10906ddd0 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -46,6 +46,10 @@ 'chemrole', ] +## warn about invalid references (e.g. invalid class names) +nitpicky = True +nitpick_ignore = [] + autosummary_generate = True autosummary_imported_members = True remove_from_toctrees = [ From 828f55f3bd18ae4b8a0199e1e9cf9c32e620d8f7 Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Tue, 11 Feb 2025 15:17:29 +0100 Subject: [PATCH 06/11] extended description, usability and more example for algorithms, centroiding and param_handling --- docs/source/user_guide/algorithms.rst | 3 +++ docs/source/user_guide/centroiding.rst | 18 +++++++++++++----- docs/source/user_guide/parameter_handling.rst | 11 ++++++++++- docs/source/user_guide/smoothing.rst | 2 +- .../user_guide/spectrum_normalization.rst | 14 +++++++++++++- 5 files changed, 40 insertions(+), 8 deletions(-) diff --git a/docs/source/user_guide/algorithms.rst b/docs/source/user_guide/algorithms.rst index 95c5d28e4..703d4ea3e 100644 --- a/docs/source/user_guide/algorithms.rst +++ b/docs/source/user_guide/algorithms.rst @@ -28,7 +28,10 @@ propagate the new parameters to the algorithm: algorithm.setParameters(param) exp = MSExperiment() + # populate exp, for example load from file + # ... + algorithm.filterExperiment(exp) Since they work on a single :py:class:`~.MSExperiment` object, little input is needed to diff --git a/docs/source/user_guide/centroiding.rst b/docs/source/user_guide/centroiding.rst index 8d57c1b95..89ed40588 100644 --- a/docs/source/user_guide/centroiding.rst +++ b/docs/source/user_guide/centroiding.rst @@ -33,11 +33,18 @@ Let's zoom in on an isotopic pattern in profile mode and plot it. plt.plot( profile_spectra[0].get_peaks()[0], profile_spectra[0].get_peaks()[1] ) # plot the first spectrum - + plt.show() + .. image:: img/profile_data.png -Because of the limited resolution of MS instruments m/z measurements are not of unlimited precision. -Consequently, peak shapes spreads in the m/z dimension and resemble a gaussian distribution. +Due to the limited resolution of mass spectrometry (MS) instruments, m/z measurements exhibit a certain spread +when multiple copies of a molecule are measured. Even with identical mass and charge, the copies are recorded with +slight deviations in the m/z dimension. Consequently, peak shapes in this dimension adopt a Gaussian-like distribution. +The number of copies correlates with the peak height (or rather peak volume). + +A single peptide species, e.g. "DPFINAGER" at charge 2, typically consists of various molecular +entities that differ in the number of neutrons, leading to an isotopic distribution and resulting in multiple peaks. + Using the :py:class:`~.PeakPickerHiRes` algorithm, we can convert data from profile to centroided mode. Usually, not much information is lost by storing only centroided data. Thus, many algorithms and tools assume that centroided data is provided. @@ -55,8 +62,9 @@ by storing only centroided data. Thus, many algorithms and tools assume that cen plt.stem( centroided_spectra[0].get_peaks()[0], centroided_spectra[0].get_peaks()[1] ) # plot as vertical lines - + plt.show() + .. image:: img/centroided_data.png After centroiding, a single m/z value for every isotopic peak is retained. By plotting the centroided data as stem plot -we discover that (in addition to the isotopic peaks) some low intensity peaks (intensity at approx. 4k) were present in the profile data. +we discover that (in addition to the isotopic peaks) some low intensity peaks (intensity at approx. 4k units on the y-axis) were present in the profile data. diff --git a/docs/source/user_guide/parameter_handling.rst b/docs/source/user_guide/parameter_handling.rst index f6aba61b6..c6a708dc8 100644 --- a/docs/source/user_guide/parameter_handling.rst +++ b/docs/source/user_guide/parameter_handling.rst @@ -174,6 +174,9 @@ Also, for floats/ints only a restricted interval of numbers may be valid. Usually, these restrictions are set by the OpenMS algorithm/class which hands out the parameters. Then, if you provide invalid values via ``setParameters``, the algorithm will throw an exception. +It is usually interesting to inspect the restrictions to know what methods a class supports, e.g. see below for an example +using a GaussFilter and the Normalizer. + In theory, you can create your own restrictions. Usually this is done when defining the algorithm in C++ and is out of scope here. E.g. @@ -183,12 +186,18 @@ E.g. gf = oms.GaussFilter() gfp = gf.getParameters() - gfp.getValidStrings(b"use_ppm_tolerance") ## yields [b'true', b'false'] + gfp.getValidStrings("use_ppm_tolerance") ## yields [b'true', b'false'] gfp.setValue(b"use_ppm_tolerance", "maybe") ## does not do anything ... ## ... until you actually set the parameters: gf.setParameters(gfp) ## --> throws a RuntimeError GaussFilter: Invalid string parameter value 'maybe' for parameter 'use_ppm_tolerance' given! Valid values are: 'true,false'. + nor = oms.Normalizer() + norp = nor.getParameters() + norp.getValidStrings("method") ## yields [b'to_one', b'to_TIC'] + norp.setValue("method", "to_TIC") ## pick the 'to_TIC' method + nor.setParameters(norp) + # ... now run the Normalizer ... diff --git a/docs/source/user_guide/smoothing.rst b/docs/source/user_guide/smoothing.rst index ae1677404..cb5fd971e 100644 --- a/docs/source/user_guide/smoothing.rst +++ b/docs/source/user_guide/smoothing.rst @@ -28,7 +28,7 @@ further analysis We can now load our data into :term:`TOPPView` to observe the effect of the smoothing, which becomes apparent when we overlay the two files (drag onto each other) and -then zoom into a given mass range using Ctrl-G and select :math:`4030` to :math:`4045`: +then zoom into a given mass range using Ctrl-G and select `m/z` in the range :math:`4033` to :math:`4040`: .. image:: img/smoothing.png diff --git a/docs/source/user_guide/spectrum_normalization.rst b/docs/source/user_guide/spectrum_normalization.rst index 35d0aacb8..9b44e47b1 100644 --- a/docs/source/user_guide/spectrum_normalization.rst +++ b/docs/source/user_guide/spectrum_normalization.rst @@ -31,7 +31,8 @@ To begin, we need to load the mass spectrometry data. The following Python code Normalization Procedure ----------------------- -After loading the data, the next step is to apply normalization. +After loading the data, the next step is to apply normalization. We use +the :py:class:`~.Normalizer` class. .. code-block:: python :linenos: @@ -49,6 +50,17 @@ After loading the data, the next step is to apply normalization. :align: center :alt: Spectrum after normalization +To list all available normalization methods of the :py:class:`~.Normalizer`, either look into its documentation, or +query the valid values of its `method` parameter: + +.. code-block:: python + :linenos: + + normalizer = oms.Normalizer() + param = normalizer.getParameters() + print(param.getValidStrings("method")) # [b'to_one', b'to_TIC'] + + TIC Normalization ----------------- From bba2d77ed9d0ceebd088a8101677133c6b8b07fa Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Thu, 27 Feb 2025 14:42:01 +0100 Subject: [PATCH 07/11] additions to glossary --- docs/source/user_guide/glossary.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/source/user_guide/glossary.rst b/docs/source/user_guide/glossary.rst index 2760dc1c1..a99715488 100644 --- a/docs/source/user_guide/glossary.rst +++ b/docs/source/user_guide/glossary.rst @@ -35,6 +35,8 @@ A glossary of common terms used throughout OpenMS documentation. electrospray ionization ESI Electrospray ionization (ESI) is a technique used in MS to produce ions. + ESI usually gives rise to peptides of charge 2 or higher. + ESI is usually coupled to Orbitrap and FTICR instruments. FASTA A text-based file format for representing nucleotide or amino acid sequences. @@ -81,11 +83,17 @@ A glossary of common terms used throughout OpenMS documentation. spectrometry data. More information is available in the `OpenMS API reference documentation `__. + MALDI + matrix-assisted laser desorption/ionization (MALDI) is an ionization technique that uses a laser energy-absorbing matrix to create ions. + ESI usually gives rise to peptides of charge 1. + MALDI is usually employed in combination with TOF instruments. + Mass Spectrometry MS An analytical technique to measure the mass over charge (m/z) ratio of ions along with their abundance. This gives rise to a mass spectrum (with m/z on the x-axis and abundance on the y-axis). + mass spectra mass spectrum A visual or numerical representation of a measurement from an MS instrument. A spectrum contains (usually many) pairs of mass-over-charge(m/z)+intensity values. @@ -134,11 +142,16 @@ A glossary of common terms used throughout OpenMS documentation. OpenMS API A C++ interface that allows developers to use OpenMS core library classes and methods. + Orbitrap orbitrap In MS, an ion trap mass analyzer consisting of an outer barrel-like electrode and a coaxial inner spindle-like electrode that traps ions in an orbital motion around the spindle. An ultra-high resolution MS analyzer, capable of resolving fine-isotope structure. + peak maps + peak map + A collection of mass spectra (and/or chromatograms), usually sorted by retention time. Can contain spectra of one or more MS levels (usually level 1 and 2). + peptide-spectrum match PSM A method used in proteomics to identify proteins from a complex mixture. Involves comparing the From b49bae7133a3535f22d8221b007c35ac84ff0508 Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Thu, 27 Feb 2025 14:42:23 +0100 Subject: [PATCH 08/11] rework 7 tutorials --- docs/source/user_guide/adduct_detection.rst | 5 +- .../charge_isotope_deconvolution.rst | 113 +++++++++++++----- docs/source/user_guide/chemistry.rst | 2 + docs/source/user_guide/feature_detection.rst | 58 ++++----- docs/source/user_guide/map_alignment.rst | 3 +- docs/source/user_guide/parameter_handling.rst | 41 ++++++- docs/source/user_guide/spectrum_merging.rst | 16 +-- 7 files changed, 167 insertions(+), 71 deletions(-) diff --git a/docs/source/user_guide/adduct_detection.rst b/docs/source/user_guide/adduct_detection.rst index fb0e6c61d..817645dcf 100644 --- a/docs/source/user_guide/adduct_detection.rst +++ b/docs/source/user_guide/adduct_detection.rst @@ -3,9 +3,10 @@ Adduct Detection In mass spectrometry it is crucial to ionize analytes prior to detection, because they are accelerated and manipulated in electric fields, allowing their separation based on mass-to-charge ratio. This happens by addition of protons in positive mode or loss of protons in negative mode. Other ions present in the buffer solution can ionize the analyte as well, e.g. sodium, potassium or formic acid. -Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is much higher. +Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is potentially higher. + Furthermore, analytes can loose functional groups during ionization, e.g. a neutral water loss. -Since the ionization happens after liquid chromatography, different adducts for an analyte have similar retention times. +Since the ionization happens after liquid chromatography, different adducts for an analyte have almost identical retention times. .. image:: img/adduct_detection.png diff --git a/docs/source/user_guide/charge_isotope_deconvolution.rst b/docs/source/user_guide/charge_isotope_deconvolution.rst index 9bf7838c3..efa3151f0 100644 --- a/docs/source/user_guide/charge_isotope_deconvolution.rst +++ b/docs/source/user_guide/charge_isotope_deconvolution.rst @@ -3,65 +3,99 @@ Charge and Isotope Deconvolution A single mass spectrum contains measurements of one or more analytes and the m/z values recorded for these analytes. Most analytes produce multiple signals -in the mass spectrometer, due to the natural abundance of carbon :math:`13` (naturally -occurring at ca. :math:`1\%` frequency) and the large amount of carbon atoms in most -organic molecules, most analytes produce a so-called isotopic pattern with a -monoisotopic peak (all carbon are :chem:`^{12}C`) and a first isotopic peak (exactly one -carbon atom is a :chem:`^{13}C`), a second isotopic peak (exactly two atoms are :chem:`^{13}C`) etc. -Note that also other elements can contribute to the isotope pattern, see the -`chemistry section `_ for further details. +in the mass spectrometer, due to the natural abundance of heavy isotopes. +The most dominant isotope in proteins is carbon :math:`13` (naturally +occurring at ca. :math:`1.1\%` frequency). Other elements such as Hydrogen also have heavy isotopes, but +they contribute to a much lesser extend, since the heavy isotopes are very low abundant, +e.g. hydrogen :math:`2` (Deuterium), occurs at a frequency of only :math:`0.0156\%`. + +All analytes produce a so-called isotopic pattern, consisting of a +monoisotopic peak and a first isotopic peak (exactly one +extra neutron in one of the atoms of the molecule), a second isotopic peak (exactly two extra neutrons) etc. +With higher mass, the monoisotopic peak will become dimishingly small, up to the point where it is not detectable +any more (although this is rarely the case with peptides and more problematic for whole proteins in Top-Down approaches). + +By definition, the monoisotopic peak is the peak which contains only isotopes of the most abundant type. +For peptides and proteins, the constituent atoms are C,H,N,O,P,S, where incidentally, the +most abundant isotope is also the lightest isotope. Hence, for peptides and proteins, the monoisotopic peak is always +the lightest peak in an isotopic distribution. + +See the `chemistry section `_ for further details on isotope abundances and how to compute isotope patterns. In addition, each analyte may appear in more than one charge state and adduct -state, a singly charge analyte :chem:`[M +H]+` may be accompanied by a doubly +state, a singly charged analyte :chem:`[M +H]+` may be accompanied by a doubly charged analyte :chem:`[M +2H]++` or a sodium adduct :chem:`[M +Na]+`. In the case of a -multiply charged peptide, the isotopic traces are spaced by ``PROTON_MASS / +multiply charged peptide, the isotopic traces are spaced by ``NEUTRON_MASS / charge_state`` which is often close to :math:`0.5\ m/z` for doubly charged analytes, :math:`0.33\ m/z` for triply charged analytes etc. Note: tryptic peptides often appear -at least doubly charged, while small molecules often carry a single charge but -can have adducts other than hydrogen. +either singly charged (when ionized with :term:`MALDI`), or doubly charged (when ionized with :term:`ESI`). +Higher charges are also possible, but usually connected to incomplete tryptic digestions with missed cleavages. +Small molecules in metabolomics often carry a single charge but can have adducts other than hydrogen. Single Peak Example ********************************* +Let's compute the isotope distribution of the peptide ``DFPIANGER`` using the classes :py:class:`~.AASequence` and +:py:class:`~.EmpiricalFormula`. Then we use the :py:class:`~.Deisotoper` to find the monoisotopic peak: + .. code-block:: python :linenos: import pyopenms as oms - charge = 2 seq = oms.AASequence.fromString("DFPIANGER") + print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1)) + + ## get isotopic distribution for two additional hydrogens (which carry the charge) + charge = 2 seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge)) isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6)) - print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1)) # Append isotopic distribution to spectrum s = oms.MSSpectrum() - for iso in isotopes.getContainer(): - iso.setMZ(iso.getMZ() / charge) + for iso in isotopes.getContainer(): # the container contains masses, not m/z! + iso.setMZ(iso.getMZ() / charge) # ... even though it's called '.getMZ()' s.push_back(iso) print("Isotope", iso.getMZ(), ":", iso.getIntensity()) + # deisotope with 10 ppm mass tolerance oms.Deisotoper.deisotopeAndSingleChargeDefault(s, 10, True) for p in s: - print(p.getMZ(), p.getIntensity()) + print("Mono peaks:", p.getMZ(), p.getIntensity()) + +which will print: + + +.. code-block:: output + :linenos: + + [M+H]+ weight: 1018.495240604071 + Isotope 509.75180710055 : 0.5680345296859741 + Isotope 510.25348451945 : 0.3053518533706665 + Isotope 510.75516193835 : 0.09806874394416809 + Isotope 511.25683935725004 : 0.023309258744120598 + Isotope 511.75851677615003 : 0.0044969217851758 + Isotope 512.2601941950501 : 0.000738693168386817 + Mono peaks: 1018.496337734329 0.5680345296859741 Note that the algorithm presented here as some heuristics built into it, such as assuming that the isotopic peaks will decrease after the first isotopic -peak. This heuristic can be tuned by changing the parameter -``use_decreasing_model`` and ``start_intensity_check``. In this case, the -second isotopic peak is the highest in intensity and the -``start_intensity_check`` parameter needs to be set to 3. +peak. This heuristic can be tuned by setting the parameter +``use_decreasing_model`` to ``False``. +For more fine-grained control use ``start_intensity_check`` and leave ``use_decreasing_model = True`` (see :py:class:`~.Deisotoper` --> C++ documentation). +Let's look at a very heavy peptide, whose isotopic distribution is dominated by the first and second isotopic peak. .. code-block:: python :linenos: - charge = 4 seq = oms.AASequence.fromString("DFPIANGERDFPIANGERDFPIANGERDFPIANGER") + print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1)) + + charge = 4 seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge)) isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(8)) - print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1)) # Append isotopic distribution to spectrum s = oms.MSSpectrum() @@ -73,9 +107,9 @@ second isotopic peak is the highest in intensity and the min_charge = 1 min_isotopes = 2 max_isotopes = 10 - use_decreasing_model = True - start_intensity_check = 3 - oms.Deisotoper.deisotopeAndSingleCharge( + use_decreasing_model = True # ignores all intensities + start_intensity_check = 3 # here, the value does not matter, since we ignore intensities (see above) + oms.Deisotoper.deisotopeAndSingleCharge( ## a function with all parameters exposed s, 10, True, @@ -92,8 +126,23 @@ second isotopic peak is the highest in intensity and the False, ) for p in s: - print(p.getMZ(), p.getIntensity()) + print("Mono peaks:", p.getMZ(), p.getIntensity()) +.. code-block:: output + :linenos: + + [M+H]+ weight: 4016.927437824572 + Isotope 1004.9878653713499 : 0.10543462634086609 + Isotope 1005.2387040808 : 0.22646738588809967 + Isotope 1005.48954279025 : 0.25444599986076355 + Isotope 1005.7403814996999 : 0.19825772941112518 + Isotope 1005.9912202091499 : 0.12000058591365814 + Isotope 1006.2420589185999 : 0.05997777357697487 + Isotope 1006.49289762805 : 0.025713207200169563 + Isotope 1006.7437363375 : 0.009702674113214016 + Mono peaks: 4016.9296320850867 0.10543462634086609 + +This successfully recovers the monoisotopic peak, even though it is not the most abundant peak. Full Spectral De-Isotoping ************************** @@ -147,7 +196,15 @@ state: if p.getIntensity() > 0.25 * maxvalue: print(p.getMZ(), p.getIntensity()) - + unpicked_peak_data = e[214].get_peaks() + plt.bar(unpicked_peak_data[0], unpicked_peak_data[1], snap=False) + plt.show() + + picked_peak_data = s.get_peaks() + plt.bar(picked_peak_data[0], picked_peak_data[1], snap=False) + plt.show() + + which produces the following output .. code-block:: output @@ -159,7 +216,7 @@ which produces the following output 974.4589691256419 3215808.75 As we can see, the algorithm has reduced :math:`140` peaks to :math:`41` deisotoped peaks. It -also has identified a molecule at :math:`974.45\ m/z` as the most intense peak in the +also has identified a molecule with a singly charged mass of :math:`974.45\ Da` as the most intense peak in the data (base peak). Visualization diff --git a/docs/source/user_guide/chemistry.rst b/docs/source/user_guide/chemistry.rst index 46cbe2b6b..21093852a 100644 --- a/docs/source/user_guide/chemistry.rst +++ b/docs/source/user_guide/chemistry.rst @@ -5,6 +5,8 @@ OpenMS has representations for various chemical concepts including molecular formulas, isotopes, ribonucleotide and amino acid sequences as well as common modifications of amino acids or ribonucleotides. +For an introduction to isotope patterns, see `Charge and Isotope Deconvolution `_. + Constants --------- diff --git a/docs/source/user_guide/feature_detection.rst b/docs/source/user_guide/feature_detection.rst index 87208d9b6..aaec6ce78 100644 --- a/docs/source/user_guide/feature_detection.rst +++ b/docs/source/user_guide/feature_detection.rst @@ -3,37 +3,38 @@ Feature Detection One very common task in mass spectrometry is the detection of 2-dimensional patterns in m/z and time (RT) dimension from a series of :term:`MS1` scans. These -patterns are called ``Features`` and they exhibit a chromatographic elution +patterns are called a term:`Feature` and they exhibit a chromatographic elution profile in the time dimension and an isotopic pattern in the m/z dimension (see -`previous section `_ for the 1-dimensional problem). +`previous section `_ for the 1-dimensional problem). + OpenMS has multiple tools that can identify these features in 2-dimensional -data, these tools are called :py:class:`~.FeatureFinder`. Currently the following +data, these tools are called ``FeatureFinder``. Currently the following FeatureFinders are available in pyOpenMS: - - :py:class:`~.FeatureFinderMultiplexAlgorithm` (e.g., :term:`SILAC`, Dimethyl labeling, (and label-free), identification free feature detection of peptides) - :py:class:`~.FeatureFinderAlgorithmPicked` (Label-free, identification free feature detection of peptides) - :py:class:`~.FeatureFinderIdentificationAlgorithm` (Label-free identification-guided feature detection of peptides) + - :py:class:`~.FeatureFinderMultiplexAlgorithm` (e.g., :term:`SILAC`, Dimethyl labeling, (and label-free), identification free feature detection of peptides) - :py:class:`~.FeatureFindingMetabo` (Label-free, identification free feature detection of metabolites) - :py:class:`~.FeatureFinderAlgorithmMetaboIdent` (Label-free, identification guided feature detection of metabolites) -All of the algorithms above are for proteomics data with the exception of :py:class:`~.FeatureFindingMetabo` and :py:class:`~.FeatureFinderMetaboIdentCompound` for metabolomics data and small molecules in general. +All of the algorithms above are for proteomics data with the exception of :py:class:`~.FeatureFindingMetabo` and :py:class:`~.FeatureFinderAlgorithmMetaboIdent` for metabolomics data and small molecules in general. Proteomics ****************************** -Two of the most commonly used feature finders for proteomics in OpenMS are the :py:class:`~.FeatureFinder` and :py:class:`~.FeatureFinderIdentificationAlgorithm` which both work on (high -resolution) centroided data. We can use the following code to find features in MS data: +Two of the most commonly used feature finders for proteomics in OpenMS are the :py:class:`~.FeatureFinderAlgorithmPicked`, :py:class:`~.FeatureFinderMultiplexAlgorithm` and :py:class:`~.FeatureFinderIdentificationAlgorithm` which all work on (high +resolution) centroided data (FeatureFinderMultiplexAlgorithm can also work on profile data). We can use the following code to find features in MS data: .. code-block:: python from urllib.request import urlretrieve + import pyopenms as oms gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master" urlretrieve( gh + "/src/data/FeatureFinderCentroided_1_input.mzML", "feature_test.mzML" ) - import pyopenms as oms # Prepare data loading (save memory by only # loading MS1 spectra into memory) @@ -47,32 +48,30 @@ resolution) centroided data. We can use the following code to find features in M fh.load("feature_test.mzML", input_map) input_map.updateRanges() - ff = oms.FeatureFinder() - ff.setLogType(oms.LogType.CMD) + ff = oms.FeatureFinderAlgorithmPicked() # Run the feature finder - name = "centroided" - features = oms.FeatureMap() - seeds = oms.FeatureMap() - params = oms.FeatureFinder().getParameters(name) - ff.run(name, input_map, features, params, seeds) + out_features = oms.FeatureMap() ## our result + seeds = oms.FeatureMap() ## optional: you can provide seeds where FF should take place -- not used here + params = ff.getParameters(); ## we do not modify params for now + ff.run(input_map, out_features, params, seeds) - features.setUniqueIds() + out_features.setUniqueIds() fh = oms.FeatureXMLFile() - fh.store("output.featureXML", features) - print("Found", features.size(), "features") + fh.store("output.featureXML", out_features) + print("Found", out_features.size(), "features") With a few lines of Python, we are able to run powerful algorithms available in OpenMS. The resulting data is held in memory (a :py:class:`~.FeatureMap` object) and can be -inspected directly using the ``help(features)`` comment. It reveals that the +inspected directly using the ``help(out_features)`` comment. It reveals that the object supports iteration (through the ``__iter__`` function) as well as direct access (through the ``__getitem__`` function). This means we write code that uses direct access and iteration in Python as follows: .. code-block:: python - f0 = features[0] - for f in features: + f0 = out_features[0] + for f in out_features: print(f.getRT(), f.getMZ()) @@ -82,7 +81,7 @@ inspecting ``help(f)`` or by consulting the manual. Note: the output file that we have written (``output.featureXML``) is an OpenMS-internal XML format for storing features. You can learn more about file -formats in the `Reading MS data formats `_ section. +formats in the `Reading MS data formats `_ section. Metabolomics - Untargeted ************************* @@ -239,9 +238,9 @@ Now we can use the following code to detect features with :py:class:`~.FeatureFi # save FeatureMap to file oms.FeatureXMLFile().store("detected_features.featureXML", fm) -Note: the output file that we have written (``output.featureXML``) is an +Note: the output file that we have written (``detected_features.featureXML``) is an OpenMS-internal XML format for storing features. You can learn more about file -formats in the `Reading MS data formats `_ section. +formats in the `Reading MS data formats `_ section. We can get a quick overview on the detected features by plotting them using the following function: @@ -249,6 +248,8 @@ We can get a quick overview on the detected features by plotting them using the :linenos: import matplotlib.pyplot as plt + import matplotlib.colors as mcolors + import itertools def plotDetectedFeatures3D(path_to_featureXML): fm = oms.FeatureMap() @@ -258,8 +259,9 @@ We can get a quick overview on the detected features by plotting them using the fig = plt.figure() ax = fig.add_subplot(111, projection="3d") - for feature in fm: - color = next(ax._get_lines.prop_cycler)["color"] + cycled_colors = itertools.cycle(['red', 'green', 'blue', 'orange', 'purple', 'yellow', 'cyan', 'magenta', 'black', 'gray']) + + for feature, color in zip(fm, cycled_colors): # chromatogram data is stored in the subordinates of the feature for i, sub in enumerate(feature.getSubordinates()): retention_times = [ @@ -268,7 +270,7 @@ We can get a quick overview on the detected features by plotting them using the intensities = [ int(y[1]) for y in sub.getConvexHulls()[0].getHullPoints() ] - mz = sub.getMetaValue("MZ") + mz = sub.getMZ() ax.plot(retention_times, intensities, zs=mz, zdir="x", color=color) if i == 0: ax.text( @@ -284,4 +286,6 @@ We can get a quick overview on the detected features by plotting them using the ax.set_zlabel("intensity (cps)") plt.show() + plotDetectedFeatures3D("detected_features.featureXML") + .. image:: img/ffmid_graph.png diff --git a/docs/source/user_guide/map_alignment.rst b/docs/source/user_guide/map_alignment.rst index b3d9f4426..f57144afa 100644 --- a/docs/source/user_guide/map_alignment.rst +++ b/docs/source/user_guide/map_alignment.rst @@ -1,7 +1,7 @@ Map Alignment =============== -The pyOpenMS map alignment algorithms transform different maps (peak maps, :term:`feature maps`) to a common retention time axis. +The pyOpenMS map alignment algorithms transform different maps (:term:`peak maps`, :term:`feature maps`) to a common retention time axis. .. image:: img/map_alignment_illustration.png @@ -12,7 +12,6 @@ Different map alignment algorithms are available in pyOpenMS: - :py:class:`~.MapAlignmentAlgorithmPoseClustering` - :py:class:`~.MapAlignmentAlgorithmIdentification` -- :py:class:`~.MapAlignmentAlgorithmSpectrumAlignment` - :py:class:`~.MapAlignmentAlgorithmKD` - :py:class:`~.MapAlignmentTransformer` diff --git a/docs/source/user_guide/parameter_handling.rst b/docs/source/user_guide/parameter_handling.rst index c6a708dc8..372f45983 100644 --- a/docs/source/user_guide/parameter_handling.rst +++ b/docs/source/user_guide/parameter_handling.rst @@ -2,11 +2,11 @@ Parameter Handling ================== Parameter handling in OpenMS and pyOpenMS is usually implemented through inheritance -from :py:class:`~.DefaultParamHandler` and allow access to parameters through the :py:class:`~.Param` object. This +from ``DefaultParamHandler`` and allow access to parameters through the :py:class:`~.Param` object. This means, the classes implement the methods ``getDefaults``, ``getParameters`` and ``setParameters``, to access to the default parameters, the current parameters and to set new parameters, respectively. The class :py:class:`~.TheoreticalSpectrumGenerator` is just one example of many which makes use of parameter handling via -:py:class:`~.DefaultParamHandler`. +``DefaultParamHandler``. The :py:class:`~.Param` object is the central data structure here. It can be manipulated through the :py:meth:`~.Param.setValue` and :py:meth:`~.Param.getValue` methods. The :py:meth:`~.Param.exists` method can be used to check for existence of a key and should @@ -51,7 +51,7 @@ The parameters can also be accessed as True -The param object can be copied and merged into other param object: +The param object can be copied and merged into another param object: .. code-block:: python :linenos: @@ -102,7 +102,7 @@ In a param object, the keys can be removed by key name or prefix: print("Keys and values after deleting all entries.") printParamKeyAndValues(new_p) # All keys of new_p deleted -For the algorithms that inherit from :py:class:`~.DefaultParamHandler`, you can list all parameters along with their +For the algorithms that inherit from ``DefaultParamHandler``, you can list all parameters along with their description by using, for instance, the following simple function. .. code-block:: python @@ -200,4 +200,37 @@ E.g. # ... now run the Normalizer ... +Unfortunately, it is not possible to retrieve the valid ranges for floats and ints, if they have been set via the pyOpenMS API (yet). +However, one can look at either the documentation of the class in pyOpenMS docs. There will be a link to the C++ version which contains the +restrictions (if any) of all parameters of a class. +Alternatively, you can simply write the parameters to an INI file (also called :py:class:`~.ParamXMLFile`), which is a special XML file format which OpenMS uses to store parameters. +E.g. + +.. code-block:: python + :linenos: + + pphr = oms.PeakPickerHiRes() + + px = oms.ParamXMLFile() + px.store("tmp.ini", pphr.getParameters()) ## store PeakPickerHiRes params (or any Param object you like) + + ## either look at the file in Python, or open it in an Editor of your choice + print(open('tmp.ini').read()) + +The INI file looks something like this (shortened): + +.. code-block:: xml + :linenos: + + + + + + ... + +Any parameter which has restrictions on its value (strings, ints and floats) will have a ``restrictions`` attribute. +In the above example, the restriction on the ``signal_to_noise`` parameter are ``restrictions="0.0:"``, i.e. only the lower bound is restricted to 0.0. The upper bound can be any value larger than 0. + + + \ No newline at end of file diff --git a/docs/source/user_guide/spectrum_merging.rst b/docs/source/user_guide/spectrum_merging.rst index c82176d7d..4cde78036 100644 --- a/docs/source/user_guide/spectrum_merging.rst +++ b/docs/source/user_guide/spectrum_merging.rst @@ -2,7 +2,7 @@ Spectra Merge Algorithm ************************* OpenMS provides spectra merging and averaging algorithms in :py:class:`~.SpectraMerger` class. Spectra merging is to merge multiple related spectra into a single one - thus, often we end up with a reduced number of spectra. -For instance, MS1 spectra within a pre-defined retention time window or MS2 spectra from the same precursor ion. On the other hand, spectra averaging averages neighbouring spectra for each spectrum. +For instance, MS1 spectra within a pre-defined retention time window or MS2 spectra from the same precursor ion. On the other hand, spectra averaging incorporates the signal from neighbouring spectra for each spectrum. Thus, the number of spectra remains the same after spectra averaging. Both merging and averaging attempt to increase the quality of spectrum by increasing its signal to noise ratio. Spectra merging and averaging are implemented in SpectraMerger in pyOpenMS, which provides two merging (block wise and precursor method - see below) and two averaging methods (gaussian and tophat - see below). @@ -98,7 +98,7 @@ Our first example merges MS1 spectra block wise. Above example clearly demonstrates the benefit of spectra merging. The upper rows show the input spectra and the bottom the merged one. The merged spectrum (bottom) has far more signal peaks of higher intensities than the input spectra. By default, the method ``mergeSpectraBlockWise`` of :py:class:`~.SpectraMerger` merges 5 consecutive MS1 spectra into a block. -The block size could be adjusted by using ``block_method:rt_block_size`` parameter as follow: +The block size could be adjusted by using ``block_method:rt_block_size`` parameter as follows: .. code-block:: python :linenos: @@ -157,7 +157,7 @@ The block size could be adjusted by using ``block_method:rt_block_size`` paramet :align: center :alt: Blockwise merging 10 scans vs. 5 scans -As shown in the above figure, clearer signal peaks are obtained with 10 MS1 scans being merged than 5 MS1 scans. Note that the y-axis is in log scale. But if too many scans are merged, +As shown in the above figure, clearer signal peaks are obtained with 10 MS1 scans being merged, compared to 5 MS1 scans we used before. Note that the y-axis is in log scale. But if too many scans are merged, spectra containing too different sets of molecules would be merged, yielding a poor quality spectrum. The users may want to try a few different parameters to produce spectra of optimal quality. MS2 spectra merging with precursor method @@ -195,8 +195,8 @@ Next we perform MS2 spectra merging with precursor method by using the ``mergeSp Number of merged peaks: 0/0 (nan %) of blocked spectra In the above example, no MS2 spectra have been merged because no MS2 spectra had the same precursor m/z values (subject to tolerance) within retention time window. -By default, the retention time window size is 5.0 seconds and the precursor m/z tolerance is 1e-4Th. If you opens the test.mzML file, you can see a few MS2 spectra (e.g., scan numbers 2077 and 2099) -have quite close precursor m/z values (both have precursor m/z of 432.902Th), but they are apart from each other by about 10 seconds. We adjust both m/z tolerance and retention time so such MS2 spectra are merged together with ``precursor_method:mz_tolerance`` and ``precursor_method:rt_tolerance`` parameters. +By default, the retention time window size is 5.0 seconds and the precursor m/z tolerance is 1e-4Th. If you open the test.mzML file, you can see a few MS2 spectra (e.g., scan numbers 2077 and 2099) +have quite close precursor m/z values (both have precursor m/z of 432.902Th), but they are apart from each other by about 10 seconds. We adjust both m/z tolerance and retention time such that MS2 spectra are merged together with ``precursor_method:mz_tolerance`` and ``precursor_method:rt_tolerance`` parameters. .. code-block:: python :linenos: @@ -292,11 +292,11 @@ Moreover, as in the above block wise merging, we can check that a merged MS2 spe Spectra averaging : gaussian and top hat methods ------------------------------------------------ -:py:class:`~.SpectraMerger` presents a method ``average`` to average peak intensities over neighbouring spectra for a given spectrum. -As mentioned above, apart from spectra merging, the number of spectra after averaging does not change since it is carried out for each individual input spectrum. +:py:class:`~.SpectraMerger` offers the method ``average`` to average peak intensities over neighbouring spectra for a given spectrum. +As mentioned above, in contrast to spectra merging, the number of spectra after averaging does not change since it is carried out for each individual input spectrum. The two averaging methods (``gaussian`` or ``tophat``) determine how neighbouring spectra are collected and how weights for the averaging are determined. The ``gaussian`` method performs weighted average over the neighbouring spectra with weights having the shape of gaussian shape (i.e., sharply decreasing from the center). -On the other hand, the ``tophat`` method, as the name implies, performs a simple averaging over the neighbouring spectra. Below we perform ``gaussian`` averaging method. +On the other hand, the ``tophat`` method, as the name implies, performs a simple averaging over the neighbouring spectra (all weights are identical). Below, we perform ``gaussian`` averaging method. .. code-block:: python From a51580079252e6a0816a7f7b0a7c586b1f3e7d11 Mon Sep 17 00:00:00 2001 From: "chris.bielow@fu-berlin.de" Date: Thu, 27 Feb 2025 15:24:16 +0100 Subject: [PATCH 09/11] avoid exception --- docs/source/user_guide/parameter_handling.rst | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/source/user_guide/parameter_handling.rst b/docs/source/user_guide/parameter_handling.rst index 372f45983..d34fda406 100644 --- a/docs/source/user_guide/parameter_handling.rst +++ b/docs/source/user_guide/parameter_handling.rst @@ -188,9 +188,14 @@ E.g. gfp = gf.getParameters() gfp.getValidStrings("use_ppm_tolerance") ## yields [b'true', b'false'] - gfp.setValue(b"use_ppm_tolerance", "maybe") ## does not do anything ... + gfp.setValue(b"use_ppm_tolerance", "maybe") ## is invalid but setValue does not complain ## ... until you actually set the parameters: - gf.setParameters(gfp) ## --> throws a RuntimeError GaussFilter: Invalid string parameter value 'maybe' for parameter 'use_ppm_tolerance' given! Valid values are: 'true,false'. + try: + gf.setParameters(gfp) ## --> throws a RuntimeError + except RuntimeError as e: + print(f"RuntimeError: {str(e)}") + ## prints `GaussFilter: Invalid string parameter value 'maybe' for parameter 'use_ppm_tolerance' given! Valid values are: 'true,false'.` + nor = oms.Normalizer() norp = nor.getParameters() From e3d1dcebaf828a28e284ac464e9175fe02d8ffad Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Tue, 4 Mar 2025 14:42:06 +0100 Subject: [PATCH 10/11] fix Python errors in old code --- docs/source/user_guide/charge_isotope_deconvolution.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/user_guide/charge_isotope_deconvolution.rst b/docs/source/user_guide/charge_isotope_deconvolution.rst index efa3151f0..c1cddfbd3 100644 --- a/docs/source/user_guide/charge_isotope_deconvolution.rst +++ b/docs/source/user_guide/charge_isotope_deconvolution.rst @@ -124,6 +124,7 @@ Let's look at a very heavy peptide, whose isotopic distribution is dominated by use_decreasing_model, start_intensity_check, False, + True ) for p in s: print("Mono peaks:", p.getMZ(), p.getIntensity()) @@ -179,6 +180,7 @@ state: use_decreasing_model, start_intensity_check, False, + True ) print(e[214].size()) From bd1fa624f2e59c5fbf853ba060b52882727a37a3 Mon Sep 17 00:00:00 2001 From: Chris Bielow Date: Tue, 4 Mar 2025 15:20:02 +0100 Subject: [PATCH 11/11] add mising import --- docs/source/user_guide/charge_isotope_deconvolution.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/user_guide/charge_isotope_deconvolution.rst b/docs/source/user_guide/charge_isotope_deconvolution.rst index c1cddfbd3..4c87394e3 100644 --- a/docs/source/user_guide/charge_isotope_deconvolution.rst +++ b/docs/source/user_guide/charge_isotope_deconvolution.rst @@ -157,6 +157,7 @@ state: :linenos: from urllib.request import urlretrieve + import matplotlib.pyplot as plt gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master" urlretrieve(gh + "/src/data/BSA1.mzML", "BSA1.mzML")