-
Notifications
You must be signed in to change notification settings - Fork 0
combine.contigs.parallel.sh
Combine non-overlapping contigs of the same sample and locus combination, based on exonerate alignment statistics.
combine.contigs.parallel.sh -s <file> -d <directory> -e <directory> -a <positive integer> \
-c <positive numeric> -t <positive integer>
combine.contigs.R
# Required
-s Path to .txt file with sample names (without header or '>').
-d Path to directory with assembly results, containing a subdirectory for each sample.
-e Path to directory with exonerate results, containing a subdirectory for each sample.
# Optional [DEFAULT]
-a [80] Minimum target alignment length (bp).
-c [2] Minimum normalized alignment score (i.e., raw score divided by target alignment length).
-t [4] Number of samples processed in parallel.
This is a wrapper script around combine.contigs.R, which is applied for multiple samples in parallel.
That internal script reads three key files for each sample and locus combination:
- the assembled contigs (
consensus_contigs.fasta) - the best-matching contig (
SAMPLE.LOCUS.bestScore.fasta) - the exonerate alingment statistics (
SAMPLE.LOCUS.exonerate) with the alignment sugar, which is required to combine non-overlapping contigs.
The paths to these files should exist if the previous CaptureAl pipeline steps were carried out for all indicated samples, and are therefore set internally:
- cpath Path to assembled contigs.
[DEFAULT: "${d}/SAMPLE.dipspades/extracted_reads_SAMPLE.fastq.LOCUS.ids.spades/dipspades/consensus_contigs.fasta"]
- fpath Path to best-scoring contig.
[DEFAULT: "${e}/SAMPLE/SAMPLE.LOCUS.bestScore.fasta"]
- epath Path to exonerate alignment statistics, expects a file ending in `.exonerate`
[DEFAULT: "${e}/SAMLPE/SAMPLE.LOCUS.exonerate"]
It uses the alignment statistics and the supplied contigs and alignment quality filters to combine non-overlapping contigs that likely represent fragments of the same locus.
A FASTA file with the combined contig, which replaces the existing FASTA file with the single best-matching contig.
The replaced *.bestScore.fasta files are overwritten and will be used for downstream analyses (alignment).
combine.contigs.parallel.sh -s samples.txt -d NovaSeq-run1_assembly -e NovaSeq-run1_exonerate \
-a 80 -c 2 -t 20
CaptureAl v0.1 Documentation