-
Notifications
You must be signed in to change notification settings - Fork 547
Open
Description
I extracted 1,000 reads for testing and found that after adding the --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts parameter, the results of yesUMIs and yessubWLmatch_UniqueFeature were not consistent.
This is my command:
STAR --runMode alignReads \
--genomeDir /path/to/reference--readFilesIn R2.fastq.gz R1.fastq.gz \
--outSAMtype BAM SortedByCoordinate --soloUMIdedup 1MM_CR \
--clipAdapterType CellRanger4 --soloFeatures GeneFull_Ex50pAS \
--soloType CB_UMI_Simple --soloCBstart 1 --soloUMIstart 17 --soloCBlen 16 --soloUMIlen 12 \
--soloCBwhitelist 3M-february-2018.txt --outFileNamePrefix ./test/ \
--readFilesCommand zcat --outSAMattributes CR UR CB UB GX GN \
--soloCellReadStats Standard --soloCBmatchWLtype 1MM_multi_Nbase_pseudocountsThen I wrote a Python script to calculate sequencing saturation from the BAM file, and found that the results were identical to those obtained without the --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts parameter.
import oxbow as ox
import polars as pl
def cal_sa(bam_file):
df = (
ox.from_bam(
bam_file,
fields=["qname"],
tag_defs=[("CB", "Z"), ("UB", "Z"), ("GX", "Z")],
)
.pl(lazy=True)
.select(pl.col("qname"), pl.col("tags").struct.unnest())
.filter(pl.col("GX").ne("-") & pl.col("CB").ne("-") & pl.col("UB").ne("-"))
).collect()
yessubWLmatch_UniqueFeature = df["qname"].unique().len()
yesUMIs = df.select(["CB", "UB", "GX"]).unique().height
sa = 1 - yesUMIs / yessubWLmatch_UniqueFeature
return sa,yesUMIs,yessubWLmatch_UniqueFeature
print(cal_sa("Aligned.sortedByCoord.out.bam"))Question
Is this the expected behavior, or could it indicate a bug in how the saturation is computed internally?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels