Skip to content

get.coverage.stats.sh

Simon Crameri edited this page Apr 1, 2022 · 4 revisions

Description

Compute mapping and coverage statistics for a batch of samples in parallel, and write the results to two large tables.

Usage

get.coverage.stats.sh -Q <positive integer> -d <directory> -s <file> -t <integer>

Dependencies

get.coverage.stats.R
samtools
bedtools

Arguments

# Required
-Q                Minimum mapping quality used when running run.bwamem.sh.

# Optional [DEFAULT]
-d  [pwd]         Path to directory with mapping results (directory with sample subdirectories containing mapped reads).
-s  [samples.txt] File with sample names (without header or '>').
-t  [2]           Number of samples to process in parallel.

Details

The output files are named such that separate results can be genereated for varying quality thresholds ${Q}, if multiple quality thresholds were used for mapping.

Value

The script writes three files in each sample subdirectory, and collects the results of all samples in a single file.

# in each sample subdirectory
- ${in}/${name}/${name}.bwa-mem.sorted.Q${Q}.nodup.mapstats.txt   Per-sample mapping statistics, extracted from `.flagstats`.
- ${in}/${name}/${name}.bwa-mem.sorted.Q${Q}.nodup.coverage.txt   Per-sample coverage statistics, produced by `get.coverage.stats.R`.
- ${in}/${name}/${name}.bwa-mem.sorted.Q${Q}.nodup.covtab.txt"    Tabular summary with the number of loci above coverage thresholds.

# in the output directory
- ${in}/mapping.stats.Q${Q}.txt                                   File with all per-sample mapping statistics combined.
- ${in}/coverage.stats.Q${Q}.txt                                  File with all per-sample coverage statistics combined.

Examples

get.coverage.stats.sh -s samples.txt -Q 20 -t 20

Authors

Simon Crameri (ETHZ) and Stefan Zoller (GDC)

## References

  • GNU parallel

Clone this wiki locally