Skip to content

extract.readpairs.sh

Simon Crameri edited this page Apr 1, 2022 · 2 revisions

Description

For each region, extract read paires from .fastq.gz files separately for each target region, based on whether at least one of the pairs mapped. Uses samtools (Li et al. 2009) and UNIX mkfifo.

Usage

extract.readpairs.sh -s <sample file> -l <locus file> -d <directory> -m <directory> -Q <integer> -t <integer>

Dependencies

extract-reads-from-fastq.pl

Arguments

# Required
-s                  sample file
-l                  locus file
-d                  absolute path to folder with quality-filtered reads
-m                  absolute path to folder with mapping dirs

# Optional [DEFAULT]
-o  [seq-extracted] output directory (created if inexistent)
-Q  [10]            minimum mapping quality, as used for mapping using run.bwamem.sh
-b  [see details]   regex-path to BAM file. Use SAMPLE as wildcard.
-t  [2]             number of threads used

Details

It is highly recommended to run this script on a local scratch, due to the large number of files written.

-b    <SAMPLE> can be part of the string and will be replaced by the actual sample using regex,
      DEFAULT: <${mapdir}/SAMPLE/SAMPLE.bwa-mem.sorted.Q10.nodup.bam>."

Value

An output subdirectory is created for each sample, with extracted reads in .fastq files.

Examples

extract.readpairs.sh -s samples.txt -l loci.txt -d NovaSeq-run1_trimmed -m NovaSeq-run1_mapped \
                     -b NovaSeq-run1_mapped/SAMPLE/SAMPLE.bwa-mem.sorted.Q10.nodup.bam -Q 10 -t 20

Authors

Simon Crameri (ETHZ) and Stefan Zoller (GDC)

References

  • Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and G. P. D. P. Subgroup. 2009. The Sequence Alignment/Map format and SAMtools.

Clone this wiki locally