Milo group nhoods #801

MaximilianNuber · 2025-06-12T08:43:13Z

PR Checklist

Referenced issue is linked
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Description of changes

Neighborhood grouping/clustering by Louvain/igraph community_multilevel was added, as well as find_nhood_group_markers based on miloR and Dann et al. 2023 (https://doi.org/10.6084/m9.figshare.21456645).

Issue: #680

Technical details

There are 6 new user-facing functions:

Milo().group_nhoods(...)
Milo().annotate_cells_from_nhoods(...)
Milo().find_nhood_group_markers(...)
Milo().get_mean_expression(...)
Milo().plot_heatmap_with_dot_and_colorbar
Milo().plot_nhood_annotation

milo.group_nhoods takes the neighborhood connectivities sparse matrix already calculated by milo, removes edges based on milo.da_nhoods SpatialFDR and the number of overlapping cells in neighborhoods, and then performs graph_object.community_multilevel.
I added tests to confirm that the preparation of the neighborhood connectivities in mdata["milo"].varp["nhood_connectivities"] before the clustering is equivalent to the R version. The R code in pertpy.tests.tools.test_milo.py is taken directly from the miloR GitHub repository to test quivalence of the R and Python versions (https://github.com/MarioniLab/miloR/blob/master/R/groupNhoods.R).

For finding neighborhood group marker genes, the neighborhood grouping can be based on milo.group_nhoods or specified manually for custom comparisons, as done in both miloR and Dann et al. 2023.

In both miloR and Dann et al. 2023, the findNhoodGroupMarkers, or the Python equivalent respectively, combine the transfer of neighborhood group labels from the neighborhood "milo" object (mdata["milo"]) to the single-cell object (mdata["rna"]) with the differential expression analysis between the specified neighborhood groups.
I deliberately separated the annotation of cells from neighborhoods, as it enables custom differential expression analyses, the neighborhood grouping info becomes a column in mdata["rna"].obs.
The annotation of cells from nhoods needs to solve the problem of cells being members in more than one neighborhood group. miloR solves this by assigning the neighborhood group label of the last neighborhood group in the assignment loop to each cell. Dann et al. 2023 solved this by excluding overlapping cells. I implemented both versions, usable through the mode-argument as milo.annotate_cells_from_nhoods(mdata, mode = "last_wins") and milo.annotate_cells_from_nhoods(mdata, mode = "exclude_overlaps"). In mode = "last_wins", the order of neighborhood groups can be set by using a pandas categorical.

Then I implemented the rest of the differential expression between neighborhood groups in milo.find_nhood_group_markers. After some checks the pseudobulk is performed using sc.get.aggregate based on the sample column, the neighborhood groups column, and including any desired covariates. As milo.annotate_cells_from_nhoods introduces NAs, and NAs are necessary in specifying custom neighborhood groups, any NAs are filtered before pseudobulking.

Dann et al. 2023 uses scuttle::logNormCounts and scran::modelGeneVar for filtering to a specified number of highly variable genes. I implemented this, which introduces new R dependencies scuttle, scran, and SingleCellExperiment, all imported through Milo()._try_import_bioc_library for respective errors if the R package is not installed.

However, the default for filtering genes is scanpy, using scanpy.pp.highly_variable_genes as a Python version of Dann et al. 2023.
Additionally I added edger::filterByExpr as an option for gene filtering, as edgeR might already be installed if solver = "edger" was used in milo.da_nhoods.

The default for differential expression in milo.find_nhood_group_markers is PyDeseq2, optionally edgeR, again because it is likely to be installed if if solver = "edger" was used in milo.da_nhoods.
The function takes the optional arguments baseline and group_to_compare, for targeted comparison of two neighborhood groups. If both arguments are None, a one vs. all comparison is performed for every neighborhood group, as implemented in miloR.
If desired, I would like to add glmGamPoi at a later time, as it was used in Dann et al. 2023.
The column names are the as in the pertpy differential expression module, e.g. p_value, adj_p_value, log_fc, gene names or IDs are in the variable column.

Milo().get_mean_expression(...) is a simple utility, for calculating the mean expression in different groups, e.g. neighborhood groups, and serves as input to Milo().plot_heatmap_with_dot_and_colorbar, which is inspired by plots in Dann et al. 2023 for comparing the expression in different groups together with logFC between groups from differential expression testing.
Arguments let the user manipulate subplot ratio, the existence of the dotplot, and move the size legend to the right if it overlaps with colorbar annotation

Milo().plot_nhood_annotation(...) takes a categorical annotation key from mdata["milo"] and plots X_milo_graph as per the annotation key, defaults to nhood_annotation. If annotation_key = None, regresses to plot_nhood_graph.

New dependencies:

scran, scuttle and SingleCellExperiment in R if using the respective filtering method in milo.find_nhood_group_markers
edgeR, if milo.da_nhoods(..., solver = "pydeseq2") was previously used by the user.
All dependencies are only loaded if the function or option is used and is wrapped for save error, in case of R with Milo()._try_import_bioc_library
pt.data.stephenson_2021_subsampled only contains normalized counts, so I used scvelo.datasets.gastrulation. for testing, and the default testing dataset in tests_milo.py

Additional context

Demonstration of new plotting functions, used on scvelo.datasets.gastrulation():

milo.group_nhoods(mdata["milo"], )

milo.plot_nhood_annotation(mdata, 
                           annotation_key = "nhood_groups", 
                           min_size = 1,
                           legend_loc = "on data"
                          )

milo.plot_nhood_annotation(mdata, 
                           annotation_key = None, 
                           min_size = 1,
                           legend_loc = "on data"
                          )

baseline = "2"
group_to_compare = "3"

edger_res = milo.find_nhood_group_markers(mdata, de_method = "edger", filter_method = "filterByExpr",
                                         group_to_compare = group_to_compare, baseline = baseline)

varnames = (
    edger_res
        .query('log_fc >= 0.5')
        .query('adj_p_value <= 0.01')
        .sort_values("log_fc", ascending = False)
        .head(30)
        .variable.to_list()
)
mean_df = milo.get_mean_expression(mdata["rna"], "nhood_groups", var_names = varnames)
mean_df = mean_df.loc[varnames, mean_df.columns.isin([baseline, group_to_compare])]

logfc_ser = (
    edger_res
        .query('log_fc >= 0.5')
        .query('adj_p_value <= 0.01')
        .set_index("variable")
        ["log_fc"]
)
logfc_ser = (
    edger_res
        .set_index("variable")
        .loc[varnames, :]
        ["log_fc"]
)

fig = milo.plot_heatmap_with_dot_and_colorbar(mean_df, logfc_ser, 
                                       figsize = (1.5, len(logfc_ser)*0.15),
                                        dot_scale=90,
                                       panel_ratios = (0.4, 0.2, 0.1),
                                        legend_on_right=1.3
                                       )
plt.show()

Added group_nhoods to _milo.py as FUNCTION, not included in the class, which will be changed when I finalize everything. Added tests in pertpy.tests.tools.test_milo.py, where the tests focus on the processing of the neighborhood adjacency matrix and show, that the processing in miloR and Python are equivalent. Added pertpy.utils... for lazy rpy2 convenience utils. This is NOT to stay there, but I want to ask if I can include this somewhere maybe others can use it, too, or should I hardcode with localconverter, etc.?

group_nhoods, annotate_cells_from_nhoods with two modes, find_nhood_group_markers with default pydeseq2 and optionally edgeR. Statsmodels and limma for logcounts doesnt make sense if counts were normalized with a scale factor.

Improved tests so they rely only on lazily imported rpy2. removed ebayes for previously removed Statsmodels test in find_nhood_group_markers.

formulaic_contrasts was used only for edgeR, which now uses edgeR internals. The tests for find_nhood_group_markers load the scanpy pbmc3k dataset. All R dependencies required for the tests: base, stats, limma, edgeR, scran, scuttle, SingleCellExperiment.

Zethson · 2025-06-12T08:48:02Z

@MaximilianNuber I can see that it's a draft PR and there's still a lot to do but please ensure that you adhere to our existing conventions such as Google docstrings, the same plotting API structure, the same verbosity when it comes to code comments, ...

Let me know if you have any questions, please! I'll review when it's ready.

Thank you!

Zethson · 2025-06-12T08:54:16Z

Oh, and just FYI: The pre-release CI is allowed to fail at the moment but the ReadTheDocs job is not. It must pass and the docs must look good.

In the old version, nhood_gr first dropped NA by all columns in the dataframe and then took unique values, without control of ordering by the user. This change let´s the user control the ordering of groups, and removes the problem as described, by using pd.Categorical. Reran tests, and it works.

Updated doc strings to google format. Added common_plot_args to milo.plot_nhood_annotation. Added plot in ReadTheDocs for plot_nhood_annotation

…tion" This reverts commit 5a30307.

…sphinx parsing. Also commented out get_mean_expression and plot_heatmap_with_dots_and_colorbar, they introduced errors and the plot is hard to make compatible with the scanpy API.

Zethson · 2025-07-02T18:14:16Z

@MaximilianNuber please feel free to ask me if you're stuck with something. I'm happy to guide a bit if I can.

This is just a friendly and supportive check in message, NOT a stress message! Please take all the time in the world. This is not urgent.

MaximilianNuber · 2025-07-10T14:24:01Z

Dear @Zethson ,
My apologies, I currently cannot assign as much attention to this as I thought, especially since I intend to use this as part of my analysis. But once I am settled in again I will finalize it!!!

So far I have only few updates:

I still need to look more into how GitHub Actions, CI, and sphinx work, so all tests run and the documentation is beautiful. That´s my biggest issue currently.
The google doc strings are done. If not yet pushed to the repo, in my local version. I don´t want to push immediately, but after rechecking all changes I have ready.
Locally, the documentation renders, so it´s a question of making it run smoothly in the repo as well.
If you agree, I would take out the plot for the heatmap with dots for now, as it is hard for me to align this with the pertpy/scanpy plotting API. milo.plot_nhood_annotation can be a really shallow wrapper around sc.pl.embedding, so it´s easier to realize this.
I have tested the core (grouping, grouping on subsets, projecting neighborhood labels to single cells, and find_nhood_group_markers) part on my own dataset and a few more, and it biologically makes sense and aligns with miloR

I will try to check my local changes soon and push.

Zethson · 2025-07-10T14:27:18Z

Ahh no worries. Please take your time and don't stress yourself. No need for that.

I still need to look more into how GitHub Actions, CI, and sphinx work, so all tests run and the documentation is beautiful. That´s my biggest issue currently.

I'm happy to answer any questions that you might have. I guess that most LLMs should be able to give you a broad overview though if necessary.

If you agree, I would take out the plot for the heatmap with dots for now, as it is hard for me to align this with the pertpy/scanpy plotting API. milo.plot_nhood_annotation can be a really shallow wrapper around sc.pl.embedding, so it´s easier to realize this.

Okay! Totally fine with me.

MaximilianNuber added 5 commits June 4, 2025 12:19

Added nhood grouping functionality

f08088d

group_nhoods, annotate_cells_from_nhoods with two modes, find_nhood_group_markers with default pydeseq2 and optionally edgeR. Statsmodels and limma for logcounts doesnt make sense if counts were normalized with a scale factor.

changed Literal edgeR to edger to adhere to Milo().da_nhoods

a687a3d

Tests imporvement and removed ebayes

d0230ce

Improved tests so they rely only on lazily imported rpy2. removed ebayes for previously removed Statsmodels test in find_nhood_group_markers.

MaximilianNuber added 5 commits June 12, 2025 14:31

Updated doc strings and common_plot_args in plot_nhood_annotation

5a30307

Updated doc strings to google format. Added common_plot_args to milo.plot_nhood_annotation. Added plot in ReadTheDocs for plot_nhood_annotation

Revert "Updated doc strings and common_plot_args in plot_nhood_annota…

4789255

…tion" This reverts commit 5a30307.

doc_strings update, plot_nhood_annotation common_doc_params

c8bfd0a

Updated :class:pandas.DataFrame in doc strings and |logFC| for …

976e903

…sphinx parsing. Also commented out get_mean_expression and plot_heatmap_with_dots_and_colorbar, they introduced errors and the plot is hard to make compatible with the scanpy API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Milo group nhoods #801

Milo group nhoods #801

MaximilianNuber commented Jun 12, 2025

Uh oh!

Zethson commented Jun 12, 2025

Uh oh!

Zethson commented Jun 12, 2025

Uh oh!

Zethson commented Jul 2, 2025

Uh oh!

MaximilianNuber commented Jul 10, 2025

Uh oh!

Zethson commented Jul 10, 2025

Uh oh!

Uh oh!

Milo group nhoods #801

Are you sure you want to change the base?

Milo group nhoods #801

Conversation

MaximilianNuber commented Jun 12, 2025

Uh oh!

Zethson commented Jun 12, 2025

Uh oh!

Zethson commented Jun 12, 2025

Uh oh!

Zethson commented Jul 2, 2025

Uh oh!

MaximilianNuber commented Jul 10, 2025

Uh oh!

Zethson commented Jul 10, 2025

Uh oh!

Uh oh!