Skip to content

Add CiLISI as new metric component #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@

* Added `metrics/kbet_pg` and `metrics/kbet_pg_label` components (PR #52).

* Added `metrics/cilisi` new metric component (PR #57).
- ciLISI measures batch mixing in a cell type-aware manner by computing iLISI within each cell type and normalizing
the scores between 0 and 1. Unlike iLISI, ciLISI preserves sensitivity to biological variance and avoids favoring
overcorrected datasets with removed cell type signals.
We propose adding this metric to substitute iLISI.

## Minor changes

* Un-pin the scPRINT version and update parameters (PR #51)
Expand Down
51 changes: 51 additions & 0 deletions src/metrics/cilisi/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
__merge__: ../../api/comp_metric.yaml
name: cilisi
info:
metrics:
- name: cilisi
label: CiLISI
summary: Cell-type aware version of iLISI (Local inverse Simpson's Index).
iLISI is computed separately for each cell type or cluster, normalized between 0 and 1, and averaged across all cells (global mean).
By default, CiLISI is calculated only for groups with at least 10 cells and 2 distinct batch labels (configurable).
description: |
ciLISI measures batch mixing in a cell type-aware manner by computing iLISI within each cell type and normalizing
the scores between 0 and 1. Unlike iLISI, ciLISI preserves sensitivity to biological variance and avoids favoring
overcorrected datasets with removed cell type signals.
references:
doi: 10.1038/s41467-024-45240-z
links:
documentation: https://github.com/carmonalab/scIntegrationMetrics
repository: https://github.com/carmonalab/scIntegrationMetrics
min: 0
max: 1
maximize: true

- name: cilisi_means
label: CiLISI_means
summary: As CiLISI, but returns mean of per-group CiLISI values (i.e., average of the means per group). instead of a global average.
description: |
ciLISI measures batch mixing in a cell type-aware manner by computing iLISI within each cell type and normalizing
the scores between 0 and 1. Unlike iLISI, ciLISI preserves sensitivity to biological variance and avoids favoring
overcorrected datasets with removed cell type signals.
references:
doi: 10.1038/s41467-024-45240-z
links:
documentation: https://github.com/carmonalab/scIntegrationMetrics
repository: https://github.com/carmonalab/scIntegrationMetrics
min: 0
max: 1
maximize: true
resources:
- type: r_script
path: script.R
engines:
- type: docker
image: openproblems/base_r:1.0.0
setup:
- type: r
github: https://github.com/carmonalab/[email protected]
runners:
- type: executable
- type: nextflow
directives:
label: [midtime,midmem,midcpu]
48 changes: 48 additions & 0 deletions src/metrics/cilisi/script.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
library(anndata)
library(scIntegrationMetrics)

## VIASH START
par <- list(
input_integrated = "resources_test/task_batch_integration/cxg_immune_cell_atlas/integrated_processed.h5ad",
input_solution = "resources_test/task_batch_integration/cxg_immune_cell_atlas/solution.h5ad",
output = "output.h5ad"
)
meta <- list(
name = "cilisi"
)
## VIASH END

cat("Reading input files\n")
adata <- anndata::read_h5ad(par[["input_integrated"]])
solution <- anndata::read_h5ad(par[["input_solution"]])
embeddings <- adata$obsm[["X_emb"]]
metadata <- solution$obs

cat("Compute CiLISI metrics...\n")
lisisplit <-
scIntegrationMetrics::compute_lisi_splitBy(
X = embeddings,
meta_data = metadata,
label_colnames = "batch",
perplexity = 30,
split_by_colname = "cell_type",
normalize = TRUE,
min.cells.split = 10,
min.vars.label = 2)
# average CiLISI
cilisi <- mean(unlist(lisisplit))
# Mean per cell type
cilisi_means <- mean(sapply(lisisplit, function(x) mean(x[, 1])))

cat("Write output AnnData to file\n")
output <- anndata::AnnData(
shape = c(1,2),
uns = list(
dataset_id = adata$uns[["dataset_id"]],
normalization_id = adata$uns[["normalization_id"]],
method_id = adata$uns[["method_id"]],
metric_ids = c("cilisi", "cilisi_means"),
metric_values = list(cilisi, cilisi_means)
)
)
output$write_h5ad(par[["output"]], compression = "gzip")
Loading