Description
Should we be using hard-clapping the mate's start and end in our consensus calling tools? I see two places where we do (or will do):
- GroupReadsByUmi
- In Do not trim reads when both ends are clipped in consensus calling #1026 (see this discussion with @clintval)
I make up that the reason for adjusting the start and end based on soft-clipping is because those bases could be aligned, and may actually be aligned in the mate, which can happen if we have short inserts. But why do we adjust it also based on hard-clipping? Those bases are removed. Perhaps if there's hard clipping on only one of the reads in a pair on the one end of the molecule? Something else?
Note: there are other places we adjust based on hard-clipping as well:
And of course, there are a number of other places that use the unclipped start and unclipped end for the current record. I think examining those is worthwhile, but we should focus on the consensus calling tools first.