Sequencing saturation changed after adding --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts

I extracted 1,000 reads for testing and found that after adding the --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts parameter, the results of yesUMIs and yessubWLmatch_UniqueFeature were not consistent.

This is my command:
```bash
STAR --runMode alignReads \
  --genomeDir /path/to/reference--readFilesIn R2.fastq.gz R1.fastq.gz \
  --outSAMtype BAM SortedByCoordinate --soloUMIdedup 1MM_CR \
  --clipAdapterType CellRanger4 --soloFeatures GeneFull_Ex50pAS \
  --soloType CB_UMI_Simple --soloCBstart 1 --soloUMIstart 17 --soloCBlen 16 --soloUMIlen 12 \
  --soloCBwhitelist 3M-february-2018.txt --outFileNamePrefix ./test/ \
  --readFilesCommand zcat --outSAMattributes CR UR CB UB GX GN \
  --soloCellReadStats Standard --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts
```
Then I wrote a Python script to calculate sequencing saturation from the BAM file, and found that the results were identical to those obtained without the `--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts` parameter.
```python
import oxbow as ox
import polars as pl

def cal_sa(bam_file):
    df = (
        ox.from_bam(
            bam_file,
            fields=["qname"],
            tag_defs=[("CB", "Z"), ("UB", "Z"), ("GX", "Z")],
        )
        .pl(lazy=True)
        .select(pl.col("qname"), pl.col("tags").struct.unnest())
        .filter(pl.col("GX").ne("-") & pl.col("CB").ne("-") & pl.col("UB").ne("-"))
    ).collect()

    yessubWLmatch_UniqueFeature = df["qname"].unique().len()

    yesUMIs = df.select(["CB", "UB", "GX"]).unique().height
    sa = 1 - yesUMIs / yessubWLmatch_UniqueFeature
    return sa,yesUMIs,yessubWLmatch_UniqueFeature
print(cal_sa("Aligned.sortedByCoord.out.bam"))
```
## Question
Is this the expected behavior, or could it indicate a bug in how the saturation is computed internally?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequencing saturation changed after adding --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts #2665

Question

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sequencing saturation changed after adding --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts #2665

Description

Question

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions