Skip to content

Sequencing saturation changed after adding --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts #2665

@Liripo

Description

@Liripo

I extracted 1,000 reads for testing and found that after adding the --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts parameter, the results of yesUMIs and yessubWLmatch_UniqueFeature were not consistent.

This is my command:

STAR --runMode alignReads \
  --genomeDir /path/to/reference--readFilesIn R2.fastq.gz R1.fastq.gz \
  --outSAMtype BAM SortedByCoordinate --soloUMIdedup 1MM_CR \
  --clipAdapterType CellRanger4 --soloFeatures GeneFull_Ex50pAS \
  --soloType CB_UMI_Simple --soloCBstart 1 --soloUMIstart 17 --soloCBlen 16 --soloUMIlen 12 \
  --soloCBwhitelist 3M-february-2018.txt --outFileNamePrefix ./test/ \
  --readFilesCommand zcat --outSAMattributes CR UR CB UB GX GN \
  --soloCellReadStats Standard --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts

Then I wrote a Python script to calculate sequencing saturation from the BAM file, and found that the results were identical to those obtained without the --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts parameter.

import oxbow as ox
import polars as pl

def cal_sa(bam_file):
    df = (
        ox.from_bam(
            bam_file,
            fields=["qname"],
            tag_defs=[("CB", "Z"), ("UB", "Z"), ("GX", "Z")],
        )
        .pl(lazy=True)
        .select(pl.col("qname"), pl.col("tags").struct.unnest())
        .filter(pl.col("GX").ne("-") & pl.col("CB").ne("-") & pl.col("UB").ne("-"))
    ).collect()

    yessubWLmatch_UniqueFeature = df["qname"].unique().len()

    yesUMIs = df.select(["CB", "UB", "GX"]).unique().height
    sa = 1 - yesUMIs / yessubWLmatch_UniqueFeature
    return sa,yesUMIs,yessubWLmatch_UniqueFeature
print(cal_sa("Aligned.sortedByCoord.out.bam"))

Question

Is this the expected behavior, or could it indicate a bug in how the saturation is computed internally?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions