SCRATCH-BatchCorrection_Evaluation aligns multi-sample scRNA-seq datasets and quantitatively evaluates batch removal vs biological signal retention. It supports multiple integration strategies (Harmony, Seurat v5 RPCA/CCA, FastMNN) and generates standardized before/after visualizations and metrics (e.g., LISI, kBET, silhouette, graph connectivity). The module is implemented as three QMD notebooks and orchestrated via Nextflow for scalable, reproducible runs on Docker or Singularity. It can be used standalone or as part of the SCRATCH ecosystem.
Nextflow ≥ 21.04.0 Java ≥ 8 Docker or Singularity Git
Seurat (v5), SeuratObject, harmony, SeuratWrappers, batchelor, Matrix, data.table, ggplot2, patchwork, uwot, RANN, FNN, reticulate (kBET/LISI implementations are provided within the container; no extra setup needed.)
git clone https://github.com/WangLab-ComputationalBiology/SCRATCH-Batchcorrection_Evaluation.git
cd SCRATCH-Batchcorrection_Evaluation
main.nf — entrypoint
subworkflows/local/SCRATCH_BC.nf — scatter/gather & method fan-out
modules/local/*.nf — QMD execution wrappers
nextflow.config — default resources/containers
Parallelization occurs over methods and/or parameter grids when requested.
nextflow run main.nf -profile docker
--input_seurat_object annotation_object.RDS
--input_integration_method all
--project_name MyBC
--input_target_variables batch
-resume
nextflow run main.nf -profile singularity
--input_seurat_object annotation_object.RDS
--input_integration_method all
--project_name MyBC
--input_target_variables batch
-resume
prep: load object → join layers → PCA → choose batch_var
integrate: run selected correction(s) → store reductions (harmony, integrated.rpca, integrated.cca, mnn)
evaluate: same graph/UMAP recipe on raw PCA and corrected → compute/compare metrics → export report
--project_name Label for outputs --work_directory Output root (default: ./output) --seed Reproducibility (default: 1234) --n_threads Threads within R chunks
Parameter Description --input_seurat_object annotation_object.RDS
Parameter Values Notes --input_integration_method all, harmony, rpca, cca, mnn (defualt: all)
- Annotated Seurat object [with a valid batch_var in meta.data, normalized data (SCTransform or log-norm); raw counts present for reference]
All under work_directory:
SCRATCH-BatchCorrection_output/data/CCA|RPCA|MNN|Harmony/ <project_name>_prepped.rds <project_name>integrated.rds
SCRATCH-BatchCorrection_output/figures/CCA|RPCA|MNN|Harmony/ UMAP_before_vs_after_.png UMAP_by_batch_.png UMAP_by_celltype_.png Mixing_violin_.png
SCRATCH-BatchCorrection_output/reports/ BC_summary_<project_name>.html
Metric table columns (example):
method,batch_var,k,npcs,Resolution, iLISI_mean, iLISI_sd, kBET_reject_rate, ASW_batch, ASW_label, GraphConn, PCVar_retained
See inline comments in subworkflows/local/SCRATCH_BC.nf and QMD notebooks for method-specific parameters and advanced usage (grid sweeps, custom dims, SCT vs log-norm).
Issues and PRs welcome! Please open an issue to discuss substantial changes.
GNU General Public License v3.0. See LICENSE.