-
Notifications
You must be signed in to change notification settings - Fork 0
filter.visual.coverages.R
Visualize and filter regions for coverage statistics using ggplot2 (Wickam 2016) box-and-violin plots and heatmaps.
filter.visual.coverages.R mapfile.txt coverage.stats.txt reference.fasta <min.pregion=0.3> <min.ptaxa=0.3> <min.len=500> <min.cov=10> <max.cov=1000> <min.ratio=0.5> <min.frac=0.9>
sfile|CHR path to samples file. Header and tab-separation expected.
Sample IDs must be in the first column. Group IDs can be specified in the
second column (if not specified, all samples are assumed to constitute one group).
The group ID is used to apply region filtering criteria 4-9 within all considered
groups, to determine regions passing the filtering criteria in all groups.
Samples that do not belong to any specified group (second column empty or 'NA')
will be displayed in summary plots but will not be considerd during region filtering.
Additional columns are ignored.
stats|CHR path to alignment stats. Header and tab-separation expected.
Sample IDs must be in the first column. Alignment stats must be in the following
columns as defined in lines 114-122 of this script. Only alignment stats
of samples in will be read (warns or stops if there is a mismatch).
refseqs|CHR path to region reference sequences. Fasta format expected. Used to correlate alignment
stats with reference sequence lengths and GC content. Only regions in
will be considered (warns or stops if there is a mismatch).
min.pregion|NUM minimum fraction of regions recovered in a sample (i.e., sample has at least 1
mapped read in <min.pregion>*nregions regions) [DEFAULT: 0.3]
min.ptaxa|NUM minimum fraction of samples recovered in a region (i.e., region has at least 1
mapped read in <min.ptaxa>*nsamples samples) [DEFAULT: 0.3]
min.len|NUM minimum length in .bam [DEFAULT: 500]
min.cov|NUM minimum coverage in .bam [DEFAULT: 10]
max.cov|NUM maximum coverage in .bam [DEFAULT: 1000]
min.ratio|NUM minimum alignment fraction [DEFAULT: 0.5]
min.frac|NUM minimum fraction of samples conforming to the absolute filtering criteria 5-8
(i.e., regions must meet criteria 5-8 in (100*<min.frac>)% of considered samples,
separately for each considered group) [DEFAULT: 0.9]
- Heatmaps are produced for ALL, PASSED and FAILED regions.
-
loci_kept-q10-200-8-1000-0.2-0.15.txtwith regions that passed filters. -
loci_rm-q10-200-8-1000-0.2-0.15.txtwith regions that failed filters. -
coverage_stats-q10-500-10-1000-0.5-0.1.logwith filtering log. -
coverage_stats-q10-500-10-1000-0.5-0.1.pdfwith visualizations.
Simon Crameri (ETHZ)
- H. Wickham 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
CaptureAl v0.1 Documentation