Skip to content

combine.contigs.parallel.sh

Simon Crameri edited this page Apr 2, 2022 · 2 revisions

Description

Combine non-overlapping contigs of the same sample and locus combination, based on exonerate alignment statistics.

Usage

combine.contigs.parallel.sh -s <file> -d <directory> -e <directory> -a <positive integer> \
                            -c <positive numeric> -t <positive integer>

Dependencies

combine.contigs.R

Arguments

# Required
-s          Path to .txt file with sample names (without header or '>').
-d          Path to directory with assembly results, containing a subdirectory for each sample.
-e          Path to directory with exonerate results, containing a subdirectory for each sample.

# Optional [DEFAULT]
-a  [80]    Minimum target alignment length (bp).
-c   [2]    Minimum normalized alignment score (i.e., raw score divided by target alignment length).
-t   [4]    Number of samples processed in parallel.

Details

This is a wrapper script around combine.contigs.R, which is applied for multiple samples in parallel. That internal script reads three key files for each sample and locus combination:

  1. the assembled contigs (consensus_contigs.fasta)
  2. the best-matching contig (SAMPLE.LOCUS.bestScore.fasta)
  3. the exonerate alingment statistics (SAMPLE.LOCUS.exonerate) with the alignment sugar, which is required to combine non-overlapping contigs.

The paths to these files should exist if the previous CaptureAl pipeline steps were carried out for all indicated samples, and are therefore set internally:

- cpath   Path to assembled contigs. 
          [DEFAULT: "${d}/SAMPLE.dipspades/extracted_reads_SAMPLE.fastq.LOCUS.ids.spades/dipspades/consensus_contigs.fasta"]
- fpath   Path to best-scoring contig.
          [DEFAULT: "${e}/SAMPLE/SAMPLE.LOCUS.bestScore.fasta"]
- epath   Path to exonerate alignment statistics, expects a file ending in `.exonerate`
          [DEFAULT: "${e}/SAMLPE/SAMPLE.LOCUS.exonerate"]

It uses the alignment statistics and the supplied contigs and alignment quality filters to combine non-overlapping contigs that likely represent fragments of the same locus.

Value

A FASTA file with the combined contig, which replaces the existing FASTA file with the single best-matching contig.

The replaced *.bestScore.fasta files are overwritten and will be used for downstream analyses (alignment).

Examples

combine.contigs.parallel.sh -s samples.txt -d NovaSeq-run1_assembly -e NovaSeq-run1_exonerate \
                            -a 80 -c 2 -t 20

Clone this wiki locally