Why do we use of hard-clipping when calculating the mate's start and end?

Should we be using hard-clapping the mate's start and end in our consensus calling tools?  I see two places where we do (or will do):

1. [GroupReadsByUmi](https://github.com/fulcrumgenomics/fgbio/blob/f93fdfbb427da5f5d60304de5761b19ea8209b33/src/main/scala/com/fulcrumgenomics/umi/GroupReadsByUmi.scala#L101)
2. In #1026 (see [this discussion with @clintval](https://github.com/fulcrumgenomics/fgbio/pull/1026#discussion_r1962091807))

I make up that the reason for adjusting the start and end based on soft-clipping is because those bases _could_ be aligned, and may actually be aligned in the mate, which can happen if we have short inserts.  But why do we adjust it _also based on hard-clipping_?  Those bases are removed.  Perhaps if there's hard clipping on only one of the reads in a pair on the one end of the molecule?  Something else?

Note: there are other places we adjust based on hard-clipping as well:

1. https://github.com/fulcrumgenomics/fgbio/blob/f93fdfbb427da5f5d60304de5761b19ea8209b33/src/main/scala/com/fulcrumgenomics/util/AmpliconDetector.scala#L177C15-L181
2. https://github.com/fulcrumgenomics/fgbio/blob/f93fdfbb427da5f5d60304de5761b19ea8209b33/src/main/scala/com/fulcrumgenomics/bam/api/SamOrder.scala#L183

And of course, there are a number of other places that use the [unclipped start](https://github.com/search?q=repo%3Afulcrumgenomics%2Ffgbio%20unclippedStart&type=code) and [unclipped end](https://github.com/search?q=repo%3Afulcrumgenomics%2Ffgbio+unclippedEnd&type=code) for the current record.  I think examining those is worthwhile, but we should focus on the consensus calling tools first.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Why do we use of hard-clipping when calculating the mate's start and end? #1030

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Why do we use of hard-clipping when calculating the mate's start and end? #1030

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions