-
Notifications
You must be signed in to change notification settings - Fork 32
Milo group nhoods #801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Milo group nhoods #801
Conversation
Added group_nhoods to _milo.py as FUNCTION, not included in the class, which will be changed when I finalize everything. Added tests in pertpy.tests.tools.test_milo.py, where the tests focus on the processing of the neighborhood adjacency matrix and show, that the processing in miloR and Python are equivalent. Added pertpy.utils... for lazy rpy2 convenience utils. This is NOT to stay there, but I want to ask if I can include this somewhere maybe others can use it, too, or should I hardcode with localconverter, etc.?
group_nhoods, annotate_cells_from_nhoods with two modes, find_nhood_group_markers with default pydeseq2 and optionally edgeR. Statsmodels and limma for logcounts doesnt make sense if counts were normalized with a scale factor.
Improved tests so they rely only on lazily imported rpy2. removed ebayes for previously removed Statsmodels test in find_nhood_group_markers.
formulaic_contrasts was used only for edgeR, which now uses edgeR internals. The tests for find_nhood_group_markers load the scanpy pbmc3k dataset. All R dependencies required for the tests: base, stats, limma, edgeR, scran, scuttle, SingleCellExperiment.
@MaximilianNuber I can see that it's a draft PR and there's still a lot to do but please ensure that you adhere to our existing conventions such as Google docstrings, the same plotting API structure, the same verbosity when it comes to code comments, ... Let me know if you have any questions, please! I'll review when it's ready. Thank you! |
Oh, and just FYI: The pre-release CI is allowed to fail at the moment but the ReadTheDocs job is not. It must pass and the docs must look good. |
In the old version, nhood_gr first dropped NA by all columns in the dataframe and then took unique values, without control of ordering by the user. This change let´s the user control the ordering of groups, and removes the problem as described, by using pd.Categorical. Reran tests, and it works.
Updated doc strings to google format. Added common_plot_args to milo.plot_nhood_annotation. Added plot in ReadTheDocs for plot_nhood_annotation
…tion" This reverts commit 5a30307.
…sphinx parsing. Also commented out get_mean_expression and plot_heatmap_with_dots_and_colorbar, they introduced errors and the plot is hard to make compatible with the scanpy API.
@MaximilianNuber please feel free to ask me if you're stuck with something. I'm happy to guide a bit if I can. This is just a friendly and supportive check in message, NOT a stress message! Please take all the time in the world. This is not urgent. |
Dear @Zethson , So far I have only few updates:
I will try to check my local changes soon and push. |
Ahh no worries. Please take your time and don't stress yourself. No need for that.
I'm happy to answer any questions that you might have. I guess that most LLMs should be able to give you a broad overview though if necessary.
Okay! Totally fine with me. |
PR Checklist
docs
is updatedDescription of changes
Neighborhood grouping/clustering by Louvain/igraph
community_multilevel
was added, as well asfind_nhood_group_markers
based onmiloR
and Dann et al. 2023 (https://doi.org/10.6084/m9.figshare.21456645).Issue: #680
Technical details
There are 6 new user-facing functions:
Milo().group_nhoods(...)
Milo().annotate_cells_from_nhoods(...)
Milo().find_nhood_group_markers(...)
Milo().get_mean_expression(...)
Milo().plot_heatmap_with_dot_and_colorbar
Milo().plot_nhood_annotation
milo.group_nhoods
takes the neighborhood connectivities sparse matrix already calculated by milo, removes edges based onmilo.da_nhoods
SpatialFDR and the number of overlapping cells in neighborhoods, and then performsgraph_object.community_multilevel
.I added tests to confirm that the preparation of the neighborhood connectivities in
mdata["milo"].varp["nhood_connectivities"]
before the clustering is equivalent to the R version. The R code in pertpy.tests.tools.test_milo.py is taken directly from the miloR GitHub repository to test quivalence of the R and Python versions (https://github.com/MarioniLab/miloR/blob/master/R/groupNhoods.R).For finding neighborhood group marker genes, the neighborhood grouping can be based on
milo.group_nhoods
or specified manually for custom comparisons, as done in both miloR and Dann et al. 2023.In both miloR and Dann et al. 2023, the
findNhoodGroupMarkers
, or the Python equivalent respectively, combine the transfer of neighborhood group labels from the neighborhood "milo" object (mdata["milo"]
) to the single-cell object (mdata["rna"]
) with the differential expression analysis between the specified neighborhood groups.I deliberately separated the annotation of cells from neighborhoods, as it enables custom differential expression analyses, the neighborhood grouping info becomes a column in
mdata["rna"].obs
.The annotation of cells from nhoods needs to solve the problem of cells being members in more than one neighborhood group. miloR solves this by assigning the neighborhood group label of the last neighborhood group in the assignment loop to each cell. Dann et al. 2023 solved this by excluding overlapping cells. I implemented both versions, usable through the mode-argument as
milo.annotate_cells_from_nhoods(mdata, mode = "last_wins")
andmilo.annotate_cells_from_nhoods(mdata, mode = "exclude_overlaps")
. Inmode = "last_wins"
, the order of neighborhood groups can be set by using a pandas categorical.Then I implemented the rest of the differential expression between neighborhood groups in
milo.find_nhood_group_markers
. After some checks the pseudobulk is performed usingsc.get.aggregate
based on the sample column, the neighborhood groups column, and including any desired covariates. Asmilo.annotate_cells_from_nhoods
introduces NAs, and NAs are necessary in specifying custom neighborhood groups, any NAs are filtered before pseudobulking.Dann et al. 2023 uses
scuttle::logNormCounts
andscran::modelGeneVar
for filtering to a specified number of highly variable genes. I implemented this, which introduces new R dependenciesscuttle
,scran
, andSingleCellExperiment
, all imported throughMilo()._try_import_bioc_library
for respective errors if the R package is not installed.However, the default for filtering genes is scanpy, using
scanpy.pp.highly_variable_genes
as a Python version of Dann et al. 2023.Additionally I added
edger::filterByExpr
as an option for gene filtering, asedgeR
might already be installed ifsolver = "edger"
was used inmilo.da_nhoods
.The default for differential expression in
milo.find_nhood_group_markers
is PyDeseq2, optionally edgeR, again because it is likely to be installed if ifsolver = "edger"
was used inmilo.da_nhoods
.The function takes the optional arguments
baseline
andgroup_to_compare
, for targeted comparison of two neighborhood groups. If both arguments areNone
, a one vs. all comparison is performed for every neighborhood group, as implemented in miloR.If desired, I would like to add glmGamPoi at a later time, as it was used in Dann et al. 2023.
The column names are the as in the pertpy differential expression module, e.g. p_value, adj_p_value, log_fc, gene names or IDs are in the variable column.
Milo().get_mean_expression(...)
is a simple utility, for calculating the mean expression in different groups, e.g. neighborhood groups, and serves as input toMilo().plot_heatmap_with_dot_and_colorbar
, which is inspired by plots in Dann et al. 2023 for comparing the expression in different groups together with logFC between groups from differential expression testing.Arguments let the user manipulate subplot ratio, the existence of the dotplot, and move the size legend to the right if it overlaps with colorbar annotation
Milo().plot_nhood_annotation(...)
takes a categorical annotation key from mdata["milo"] and plotsX_milo_graph
as per the annotation key, defaults tonhood_annotation
. Ifannotation_key = None
, regresses toplot_nhood_graph
.New dependencies:
scran
,scuttle
andSingleCellExperiment
in R if using the respective filtering method inmilo.find_nhood_group_markers
edgeR
, if milo.da_nhoods(..., solver = "pydeseq2") was previously used by the user.All dependencies are only loaded if the function or option is used and is wrapped for save error, in case of R with
Milo()._try_import_bioc_library
pt.data.stephenson_2021_subsampled
only contains normalized counts, so I usedscvelo.datasets.gastrulation
. for testing, and the default testing dataset in tests_milo.pyAdditional context
Demonstration of new plotting functions, used on scvelo.datasets.gastrulation():