Skip to content

Conversation

@daviesrob
Copy link
Member

cram_next_slice() with a region set tries to skip over containers with contents that don't overlap the region of interest. For files with a mixture of read lengths, it's possible that some decode jobs have been queued before a later container is skipped. This makes calling cram_seek() hazardous as it drops any in-flight jobs before updating the file position. Instead, call hseek() directly so already-queued decode jobs are retained.

Fixes samtools/samtools#2285 (CRAM region queries miss reads when using additional threads)

cram_next_slice() with a region set tries to skip over containers
with contents that don't overlap the region of interest.  For
files with a mixture of read lengths, it's possible that some
decode jobs have been queued before a later container is skipped.
This makes calling cram_seek() hazardous as it drops any in-flight
jobs before updating the file position.  Instead, call hseek()
directly so already-queued decode jobs are retained.
The test creates a CRAM file with records in lots of containers,
many of which should be skipped when making a region query that
expects to return the first and last two records in the file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CRAM region queries miss reads when using additional threads

2 participants