Description
Hello everyone,
with this I'm referring to issue #556. I'd like to call Duplex Consensus reads using fgbio as well. Unfortunately, I'm facing the exact same issue as mentioned above.
The samples I'm using are paired end sequenced and tagged with 6bp UMI and 11bp stem resulting in a cigar like 6M11S+T 6M11S+T.
Afterwards, I'm extracting UMIs with fgbio-2.0.2 ExtractUmisFromBam, followed by GroupReadsByUmi and CallDuplexConsensusReads
java -jar fgbio-2.0.2.jar ExtractUmisFromBam \ -i /path_unmapped.bam \ -o /path_unmapped_umi_extracted.bam \ -r 6M11S+T 6M11S+T \ --molecular-index-tags=ZA ZB \ --single-tag=RX
java -jar fgbio-2.0.2.jar GroupReadsByUmi \ --input=/path_umi_extracted_aligned_merged_filtered.bam --output=/path_umi_grouped.bam \ --strategy=paired --edits=1
java -jar fgbio-2.0.2.jar CallDuplexConsensusReads \ --input=/path_umi_grouped.bam --output=/path_consensus_unmapped.bam \ --error-rate-post-umi 30 --error-rate-pre-umi 45 \ --min-input-base-quality=20
This leads to the following outputs (examplary Read-ID extracted)
original fastq read 1:
@M01950:199:000000000-CN4MR:1:1106:27160:16985 1:N:0:2 CACAGGAGTAGCTCAAGAGCTCGTCTGGTAACTATCCATATATGAATAGCCCATTATGGACTCGGGTGAGCTGGTATAGGGGTCTGGGTACTCAGACTTGATGGCCCGGCTAGGAAAGTGGCCATATGTTTGGTAACCTTGCAGGCTGCCG + CCCCCCFCFFFFGGGGGGGGGGGGGHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGHGHGHHHHGHHHHHHGGGHHHHHHHHHHHHHHHHHHHHHHHGGGGGHHHHHHHHHGHHFHHHHHHHGFFHHHHHHHHHHGHHGHG
original fastq read 2:
@M01950:199:000000000-CN4MR:1:1106:27160:16985 2:N:0:2 TTGAACTCAGTAGCTCACAAAACATCCACTCTGCCTCCAAAGGCCTACCTCTGAACCATGCTGCCTTGCCTCCTACAGACTATGACAGAAGTCCCTTTGTAACATCCCCCATTAGCATGACAATGCCCCCTCACGGCAGCCTGCAAGGTT + CCCCCFFFFFFFGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHGHHHGFGHHHHHHHHGGGGHHHHHGHHHHHHGGHHGGGGHHHGGGGGHHHHHGHGHH
bam-file reads after UMI extraction:
M01950:199:000000000-CN4MR:1:1106:8824:16985+CACAGGTTGAAC 163 chr1 200048348 60 133M = 200048464 250 CAAAACATCCACTCTGCCTCCAAAGGCCTACCTCTGAACCATGCTGCCTTGCCTCCTACAGACTATGACAGAAGTCCCTTTGTAACATCCCCCATTAGCATGACAATGCCCCCTCACGGCAGCCTGCAAGGTT GGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHGHHHGFGHHHHHHHHGGGGHHHHHGHHHHHHGGHHGGGGHHHGGGGGHHHHHGHGHH ZA:Z:CACAGG ZB:Z:TTGAAC MC:Z:134M MD:Z:133 RG:Z:A NM:i:0 MQ:i:60 UQ:i:0 AS:i:133RX:Z:CACAGG-TTGAAC
M01950:199:000000000-CN4MR:1:1106:8824:16985+CACAGGTTGAAC 83 chr1 200048464 60 134M = 200048348 -250 CGGCAGCCTGCAAGGTTACCAAACATATGGCCACTTTCCTAGCCGGGCCATCAAGTCTGAGTACCCAGACCCCTATACCAGCTCACCCGAGTCCATAATGGGCTATTCATATATGGATAGTTACCAGACGAGCT GHGHHGHHHHHHHHHHFFGHHHHHHHFHHGHHHHHHHHHGGGGGHHHHHHHHHHHHHHHHHHHHHHHGGGHHHHHHGHHHHGHGHGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHGGGGGGGG ZA:Z:CACAGG ZB:Z:TTGAAC MC:Z:133M MD:Z:134 RG:Z:A NM:i:0 MQ:i:60 UQ:i:0 AS:i:134RX:Z:CACAGG-TTGAAC
grouped reads:
M01950:199:000000000-CN4MR:1:1106:8824:16985+CACAGGTTGAAC 83 chr1 200048464 60 134M = 200048348 -250 CGGCAGCCTGCAAGGTTACCAAACATATGGCCACTTTCCTAGCCGGGCCATCAAGTCTGAGTACCCAGACCCCTATACCAGCTCACCCGAGTCCATAATGGGCTATTCATATATGGATAGTTACCAGACGAGCT GHGHHGHHHHHHHHHHFFGHHHHHHHFHHGHHHHHHHHHGGGGGHHHHHHHHHHHHHHHHHHHHHHHGGGHHHHHHGHHHHGHGHGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHGGGGGGGG ZA:Z:CACAGG ZB:Z:TTGAAC MC:Z:133M MD:Z:134 RG:Z:A MI:Z:6968/B NM:i:0 MQ:i:60 UQ:i:0 AS:i:134 RX:Z:CACAGG-TTGAAC
M01950:199:000000000-CN4MR:1:1106:8824:16985+CACAGGTTGAAC 163 chr1 200048348 60 133M = 200048464 250 CAAAACATCCACTCTGCCTCCAAAGGCCTACCTCTGAACCATGCTGCCTTGCCTCCTACAGACTATGACAGAAGTCCCTTTGTAACATCCCCCATTAGCATGACAATGCCCCCTCACGGCAGCCTGCAAGGTT GGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHGHHHGFGHHHHHHHHGGGGHHHHHGHHHHHHGGHHGGGGHHHGGGGGHHHHHGHGHH ZA:Z:CACAGG ZB:Z:TTGAAC MC:Z:134M MD:Z:133 RG:Z:A MI:Z:6968/B NM:i:0 MQ:i:60 UQ:i:0 AS:i:133 RX:Z:CACAGG-TTGAAC
For this exemplary MI-tag, I don't get any /A reads at all. This is the case for 99% of the reads, where there are either only /A or only /B, but not both at the same time. I think therefore I'm not able to call Duplex Consensus reads. After going through the entire data, I'm still not able to figure out where the issue is located.
I would appreciate any help solving this!
Thank you in advance
Adrian