-
Notifications
You must be signed in to change notification settings - Fork 98
Description
A little bit I started on this PR #967 which introduces a function that allows users to subset their entire SpatialData objects by certain criteria. The larger goal is to emulate this Scanpy notebook and to make Squidpy basically the biologist-friendly interface to SpatialData. Subsetting your object will be the first step in that journey.
For this, there are several considerations:
- a given
SpatialDataobject can contain 0-nAnnDataobjects - these
AnnDataobjects can annotate 0-n other objects, f.e. segmentation masks, shapes (like for Visium), ROIs or even points - a given subsetting step on the AnnData object needs to find all instances that are annotated by these soon-to-be-gone observations in all other elements and deal with them accordingly:
- segmentation masks -> set to 0 (background)
- potentially: remove transcript locations falling into these segmentation masks
- shapes -> remove
- points -> remove
- etc
- segmentation masks -> set to 0 (background)
However, there are additional constraints and open questions that are important for the implementation.
- We can f.e. store segmentation masks as DataTrees with different scales - is it faster to subset the original resolution and to then regenerate the tree or subset all scales individually?
- How do we handle inplace
TruevsFalse? Returning a copy can easily mean doubling a 500 GB object.
Some other edge cases might only really show up once there.
Generally, the goal should be to identify relevant subfunctions and push these upstream to SpatialData, some might already exist there and just need to be found (realistically by asking @LucaMarconato, there's quite a few functions only he really knows about), other might need to be written and pushed upstream. Ideally Squidpy then chains together these functions into something with good UX.