This repository holds the benchmarking code and data tables for the manuscript "Scalable high-performance single cell data analysis with BPCells." The BPCells package itself lives at github.com/bnprks/BPCells and the package documentation is here.
Data tables from benchmarking live in the results/data_tables
folder, mostly in tsv format. File paths mirror the structure of the benchmarking folders themselves (see "Figure <-> experiment mapping" below).
R scripts for plotting are in results/plots
.
See results/README.md
for details on results file contents and plotting scripts.
Benchmark code lives under the folders atac-timing
, cellxgene
, compression
, datasets
, and rna-timing
(see respective README.md files for details). Each individual benchmarking experiment has a single sub-folder. To get started reviewing a particular benchmark, look at the gen_tasks.py
file and the commands it prints to a tasks.txt
file (or just skip to the worker scripts based on naming conventions).
Benchmarks are run using an ad hoc system using arrayjob/run.py
, config_vars.sh
, and per-experiment gen_tasks.py
files within benchmarking subfolders. See arrayjob/README.md
for details on re-running benchmarks. Please note that some benchmarks can take a very large amount of compute time to run all replicates and tools. You may want to modify gen_tasks.py
to reduce the number of datasets, replicates, or tools used for certain benchmarks.
The benchmarking singularity container was converted from a docker image. See docker/README.md
for details on software versions and where to download the container image.
Figure | Experiment path |
---|---|
Fig 1b-d, Fig S1a-c | rna-timing/pca-benchmark |
Fig 1e-g, Fig S1 e+f | atac-timing/peak-tile-timing |
Fig S1d | rna-timing/marker-genes |
Fig 2b-d | compression/rna-1M-cell |
Fig 2e+j, Fig S3a | compression/in-memory-compression |
Fig 2g-i | compression/fragments-read-write |
Fig S2 | compression/bitwidth-stats |
Fig S3b | rna-timing/matrix-transpose |
Fig S3c | atac-timing/merge-fragments |
Fig 3a+b, Fig S4a | cellxgene/01_subset_unique_cells |
Fig 3c+d, Fig S4 b,e-g | cellxgene/02_matrix_slicing |
Fig 3e, Fig S4 c+d | cellxgene/03_mean_variance |
Fig 3f-h | cellxgene/04_pca |