Skip to content

Milo group nhoods #801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

MaximilianNuber
Copy link

PR Checklist

  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes

Neighborhood grouping/clustering by Louvain/igraph community_multilevel was added, as well as find_nhood_group_markers based on miloR and Dann et al. 2023 (https://doi.org/10.6084/m9.figshare.21456645).

Issue: #680

Technical details

There are 6 new user-facing functions:

  1. Milo().group_nhoods(...)
  2. Milo().annotate_cells_from_nhoods(...)
  3. Milo().find_nhood_group_markers(...)
  4. Milo().get_mean_expression(...)
  5. Milo().plot_heatmap_with_dot_and_colorbar
  6. Milo().plot_nhood_annotation

milo.group_nhoods takes the neighborhood connectivities sparse matrix already calculated by milo, removes edges based on milo.da_nhoods SpatialFDR and the number of overlapping cells in neighborhoods, and then performs graph_object.community_multilevel.
I added tests to confirm that the preparation of the neighborhood connectivities in mdata["milo"].varp["nhood_connectivities"] before the clustering is equivalent to the R version. The R code in pertpy.tests.tools.test_milo.py is taken directly from the miloR GitHub repository to test quivalence of the R and Python versions (https://github.com/MarioniLab/miloR/blob/master/R/groupNhoods.R).

For finding neighborhood group marker genes, the neighborhood grouping can be based on milo.group_nhoods or specified manually for custom comparisons, as done in both miloR and Dann et al. 2023.

In both miloR and Dann et al. 2023, the findNhoodGroupMarkers, or the Python equivalent respectively, combine the transfer of neighborhood group labels from the neighborhood "milo" object (mdata["milo"]) to the single-cell object (mdata["rna"]) with the differential expression analysis between the specified neighborhood groups.
I deliberately separated the annotation of cells from neighborhoods, as it enables custom differential expression analyses, the neighborhood grouping info becomes a column in mdata["rna"].obs.
The annotation of cells from nhoods needs to solve the problem of cells being members in more than one neighborhood group. miloR solves this by assigning the neighborhood group label of the last neighborhood group in the assignment loop to each cell. Dann et al. 2023 solved this by excluding overlapping cells. I implemented both versions, usable through the mode-argument as milo.annotate_cells_from_nhoods(mdata, mode = "last_wins") and milo.annotate_cells_from_nhoods(mdata, mode = "exclude_overlaps"). In mode = "last_wins", the order of neighborhood groups can be set by using a pandas categorical.

Then I implemented the rest of the differential expression between neighborhood groups in milo.find_nhood_group_markers. After some checks the pseudobulk is performed using sc.get.aggregate based on the sample column, the neighborhood groups column, and including any desired covariates. As milo.annotate_cells_from_nhoods introduces NAs, and NAs are necessary in specifying custom neighborhood groups, any NAs are filtered before pseudobulking.

Dann et al. 2023 uses scuttle::logNormCounts and scran::modelGeneVar for filtering to a specified number of highly variable genes. I implemented this, which introduces new R dependencies scuttle, scran, and SingleCellExperiment, all imported through Milo()._try_import_bioc_library for respective errors if the R package is not installed.

However, the default for filtering genes is scanpy, using scanpy.pp.highly_variable_genes as a Python version of Dann et al. 2023.
Additionally I added edger::filterByExpr as an option for gene filtering, as edgeR might already be installed if solver = "edger" was used in milo.da_nhoods.

The default for differential expression in milo.find_nhood_group_markers is PyDeseq2, optionally edgeR, again because it is likely to be installed if if solver = "edger" was used in milo.da_nhoods.
The function takes the optional arguments baseline and group_to_compare, for targeted comparison of two neighborhood groups. If both arguments are None, a one vs. all comparison is performed for every neighborhood group, as implemented in miloR.
If desired, I would like to add glmGamPoi at a later time, as it was used in Dann et al. 2023.
The column names are the as in the pertpy differential expression module, e.g. p_value, adj_p_value, log_fc, gene names or IDs are in the variable column.

Milo().get_mean_expression(...) is a simple utility, for calculating the mean expression in different groups, e.g. neighborhood groups, and serves as input to Milo().plot_heatmap_with_dot_and_colorbar, which is inspired by plots in Dann et al. 2023 for comparing the expression in different groups together with logFC between groups from differential expression testing.
Arguments let the user manipulate subplot ratio, the existence of the dotplot, and move the size legend to the right if it overlaps with colorbar annotation

Milo().plot_nhood_annotation(...) takes a categorical annotation key from mdata["milo"] and plots X_milo_graph as per the annotation key, defaults to nhood_annotation. If annotation_key = None, regresses to plot_nhood_graph.

New dependencies:

  • scran, scuttle and SingleCellExperiment in R if using the respective filtering method in milo.find_nhood_group_markers
  • edgeR, if milo.da_nhoods(..., solver = "pydeseq2") was previously used by the user.
    All dependencies are only loaded if the function or option is used and is wrapped for save error, in case of R with Milo()._try_import_bioc_library
    pt.data.stephenson_2021_subsampled only contains normalized counts, so I used scvelo.datasets.gastrulation. for testing, and the default testing dataset in tests_milo.py

Additional context

Demonstration of new plotting functions, used on scvelo.datasets.gastrulation():

milo.group_nhoods(mdata["milo"], )

milo.plot_nhood_annotation(mdata, 
                           annotation_key = "nhood_groups", 
                           min_size = 1,
                           legend_loc = "on data"
                          )
image
milo.plot_nhood_annotation(mdata, 
                           annotation_key = None, 
                           min_size = 1,
                           legend_loc = "on data"
                          )
image
baseline = "2"
group_to_compare = "3"

edger_res = milo.find_nhood_group_markers(mdata, de_method = "edger", filter_method = "filterByExpr",
                                         group_to_compare = group_to_compare, baseline = baseline)

varnames = (
    edger_res
        .query('log_fc >= 0.5')
        .query('adj_p_value <= 0.01')
        .sort_values("log_fc", ascending = False)
        .head(30)
        .variable.to_list()
)
mean_df = milo.get_mean_expression(mdata["rna"], "nhood_groups", var_names = varnames)
mean_df = mean_df.loc[varnames, mean_df.columns.isin([baseline, group_to_compare])]

logfc_ser = (
    edger_res
        .query('log_fc >= 0.5')
        .query('adj_p_value <= 0.01')
        .set_index("variable")
        ["log_fc"]
)
logfc_ser = (
    edger_res
        .set_index("variable")
        .loc[varnames, :]
        ["log_fc"]
)

fig = milo.plot_heatmap_with_dot_and_colorbar(mean_df, logfc_ser, 
                                       figsize = (1.5, len(logfc_ser)*0.15),
                                        dot_scale=90,
                                       panel_ratios = (0.4, 0.2, 0.1),
                                        legend_on_right=1.3
                                       )
plt.show()
image

Added group_nhoods to _milo.py as FUNCTION, not included in the class, which will be changed when I finalize everything.
Added tests in pertpy.tests.tools.test_milo.py, where the tests focus on the processing of the neighborhood adjacency matrix and show, that the processing in miloR and Python are equivalent.

Added pertpy.utils... for lazy rpy2 convenience utils. This is NOT to stay there, but I want to ask if I can include this somewhere maybe others can use it, too, or should I hardcode with localconverter, etc.?
group_nhoods, annotate_cells_from_nhoods with two modes, find_nhood_group_markers with default pydeseq2 and optionally edgeR. Statsmodels and limma for logcounts doesnt make sense if counts were normalized with a scale factor.
Improved tests so they rely only on lazily imported rpy2.
removed ebayes for previously removed Statsmodels test in find_nhood_group_markers.
formulaic_contrasts was used only for edgeR, which now uses edgeR internals. The tests for find_nhood_group_markers load the scanpy pbmc3k dataset. All R dependencies required for the tests:
base, stats, limma, edgeR, scran, scuttle, SingleCellExperiment.
@Zethson
Copy link
Member

Zethson commented Jun 12, 2025

@MaximilianNuber I can see that it's a draft PR and there's still a lot to do but please ensure that you adhere to our existing conventions such as Google docstrings, the same plotting API structure, the same verbosity when it comes to code comments, ...

Let me know if you have any questions, please! I'll review when it's ready.

Thank you!

@Zethson
Copy link
Member

Zethson commented Jun 12, 2025

Oh, and just FYI: The pre-release CI is allowed to fail at the moment but the ReadTheDocs job is not. It must pass and the docs must look good.

In the old version, nhood_gr first dropped NA by all columns in the dataframe and then took unique values, without control of ordering by the user.
This change let´s the user control the ordering of groups, and removes the problem as described, by using pd.Categorical.
Reran tests, and it works.
Updated doc strings to google format. Added common_plot_args to milo.plot_nhood_annotation. Added plot in ReadTheDocs for plot_nhood_annotation
…sphinx parsing.

Also commented out get_mean_expression and plot_heatmap_with_dots_and_colorbar, they introduced errors and the plot is hard to make compatible with the scanpy API.
@Zethson
Copy link
Member

Zethson commented Jul 2, 2025

@MaximilianNuber please feel free to ask me if you're stuck with something. I'm happy to guide a bit if I can.

This is just a friendly and supportive check in message, NOT a stress message! Please take all the time in the world. This is not urgent.

@MaximilianNuber
Copy link
Author

Dear @Zethson ,
My apologies, I currently cannot assign as much attention to this as I thought, especially since I intend to use this as part of my analysis. But once I am settled in again I will finalize it!!!

So far I have only few updates:

  • I still need to look more into how GitHub Actions, CI, and sphinx work, so all tests run and the documentation is beautiful. That´s my biggest issue currently.
  • The google doc strings are done. If not yet pushed to the repo, in my local version. I don´t want to push immediately, but after rechecking all changes I have ready.
  • Locally, the documentation renders, so it´s a question of making it run smoothly in the repo as well.
  • If you agree, I would take out the plot for the heatmap with dots for now, as it is hard for me to align this with the pertpy/scanpy plotting API. milo.plot_nhood_annotation can be a really shallow wrapper around sc.pl.embedding, so it´s easier to realize this.
  • I have tested the core (grouping, grouping on subsets, projecting neighborhood labels to single cells, and find_nhood_group_markers) part on my own dataset and a few more, and it biologically makes sense and aligns with miloR

I will try to check my local changes soon and push.

@Zethson
Copy link
Member

Zethson commented Jul 10, 2025

Ahh no worries. Please take your time and don't stress yourself. No need for that.

I still need to look more into how GitHub Actions, CI, and sphinx work, so all tests run and the documentation is beautiful. That´s my biggest issue currently.

I'm happy to answer any questions that you might have. I guess that most LLMs should be able to give you a broad overview though if necessary.

If you agree, I would take out the plot for the heatmap with dots for now, as it is hard for me to align this with the pertpy/scanpy plotting API. milo.plot_nhood_annotation can be a really shallow wrapper around sc.pl.embedding, so it´s easier to realize this.

Okay! Totally fine with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants