Open
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36ddc497a3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR improves the realism and configurability of simulated isotope envelopes and adduct mixtures. It replaces the previous carbon-only isotope approximation with a multi-element approach based on natural isotope abundances, adds configurable adduct priors/profiles for different ionisation regimes, and introduces a small deisotoping utility that we use in tests to validate the generated patterns.
Weaknesses in the previous implementation
Previously, isotope generation used a carbon-only binomial shortcut, so it could not model common heteroatom isotope patterns (for example chlorine/bromine M+2 and sulfur fine structure). Adduct proportions were generated using placeholder heuristics and had limited negative-mode coverage. There was also no lightweight deisotoping routine available to sanity-check that generated isotope envelopes behave as expected.
What we changed
Isotope generation
Isotope generation now approximates the full-formula isotope envelope by convolving per-element natural isotope distributions (
NATURAL_ISOTOPES).(mass_shift, abundance)distributions relative to each element’s monoisotope.mass_precision,min_prob,max_states) to keep runtime and memory bounded.total_proportion(ormax_peaks) and renormalise.isotopes[0]because downstream code assumes that invariant.Implementation:
Isotopes._get_isotope_distribution/_power_distribution/_convolve_distributionsinvimms/Chemicals.py.Adduct generation
Adduct generation now samples adduct weights from a Dirichlet distribution with configurable priors and presets.
ADDUCT_NAMES_POS/NEGand prior presets fromADDUCT_PRIOR_POS/NEG+ADDUCT_PROFILE_PRESETS.weights ~ Dirichlet(prior * adduct_concentration).adduct_proportion_cutoff; if all weights are cut, pick the single most likely adduct.max(weights)so the dominant adduct has weight 1.0 (preserving the historical “scale by max” semantics).Implementation: profile resolution in
Adducts.__init__and sampling inAdducts._get_adduct_proportionsinvimms/Chemicals.py.We also expanded common negative-mode adduct coverage and removed the default
[M+NH3]+Hentry because it has the same mass shift asM+NH4and would otherwise duplicate signal at the same m/z.Deisotoping
Deisotoping is implemented in
vimms/Deisotoping.pyas a small utility used in tests to validate generated isotope patterns.ppm_tolerance).NATURAL_ISOTOPES).min_isotopespeaks, and mark its peaks as assigned.Implementation:
Deisotoper.deisotope/_guess_charge/_grow_clusterinvimms/Deisotoping.py.How we know it’s correct
We added
tests/test_deisotoping.pyto cover multi-element isotope generation and deisotoping end-to-end. The tests verify that isotope envelopes are ordered and normalised, that chlorine-containing formulae produce an M+2 pattern, that the monoisotopic peak is preserved under aggressive filtering, and that the deisotoper recovers the expected monoisotopic m/z from adducted isotope peaks.I ran the full test suite locally and it passes (
pytest: 119 tests).Notes
UnknownChemicalbehaviour is unchanged.UnknownChemicalinstances are typically created from ROI-picked peaks during re-simulation of an existing mzML; those peaks may already correspond to a specific isotope/adduct peak (not a full compound) or may simply be noise. If we generated extra isotopes/adducts from them we’d be inventing correlated peaks and potentially double-counting signal, so we continue to treatUnknownChemicalas a single-peak representation.