I have a .fastq file with 225358 sequences. When I remove duplicates with a python script, I get 209438. This corresponds to a duplication rate of 7.06%, which is also what fastp gives me. However, fastqc results in ~78% duplicate reads. Does fastqc look at strict string equality or does it consider other things when determining duplication rate?
I have a
.fastqfile with 225358 sequences. When I remove duplicates with a python script, I get 209438. This corresponds to a duplication rate of 7.06%, which is also whatfastpgives me. However,fastqcresults in ~78% duplicate reads. Doesfastqclook at strict string equality or does it consider other things when determining duplication rate?