Skip to content

Possible changes to fastq input #52

@johnlees

Description

@johnlees

Two possible additions

An equivalent to the -m option in ska1:

Finally, the base call for the middle base in the split kmer is filtered to remove any bases where the minor alleles are found in more than 20% of the kmers. This is a strategy often used in read mapping to account for sequencing error, and can be adjusted using the -m option.

We can filter on ambiguous, but not frequency of ambig.

Early stop/fast mode

This could probably made to go much faster by:

  1. Stopping reading the input after n reads (set by e.g. 15x coverage)
  2. For 'online' or ref-based analyses, only including k-mers in the ref and turning off filters (may need minimiser based bins to improve caching).

For now, the first of these could be a useful addition

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions