This package contains Perl programs/scripts that perform frequently needed operations on FASTA format files. Such as adjusting the line length to a uniform length, reverse complementing sequences, identifying entries with identical sequences, etc.
The executable files are located in the bin folder.
A tool to reverse complement the entries of FASTA format files.
A tool to get sequence length information for FASTA files.
A tool to filter entries from a multi FASTA file
based on entry names using regular expression(s).
(Replaces fasta_extract and fasta_remove)
A tool to sort entries from a multi FASTA file based on entry names using regular expression(s).
A tool to measure sequence similarity of aligned sequences in multi FASTA format files.
A tool to measure sequence variability of aligned sequences in multi FASTA format files.
A tool to extract a part of the sequences from FASTA files.
A tool to shift circular FASTA sequences using a reference FASTA file or a position.
A tool to reporting exact sequence matches of entries in a reference FASTA file in a target FASTA file.
Example:
fasta_find gene.fas chr.fas
A tool to remove duplicate sequence from FASTA format files and print the groups to STDERR.
Example:
fasta_unique input.fas >unique.fas 2>unique.tab
A tool to format FASTA file to replace names from duplicate removal by fasta_unique. Using the produced fasta file and the names table.
Example:
fasta_deunique -i unique.fas -tab unique.tab >deunique.fas
A tool to format FASTA files to uniform column width (60).
fasta_pretty [OPTIONS] [FILE]...
-
-h | --help
Print the help message; ignore other arguments.
STDIN and/or FASTA files. The extention of the files is irrelevant.
The output is FASTA format with 60 line length for the sequence.
The program prints to STDOUT.
This can be captured in a file by using the > or >> operator.
Format a single file (input.fas) and save it to a file (output.fas).
fasta_pretty input.fas >output.fas
cat input.fas | fasta_pretty >output.fas
cat input.fas | fasta_pretty - >output.fas
Format and concatenate three FASTA files from the current directory
(input1.fas, input2.fas and input3.fas) and save it to a file (output.fas).
fasta_pretty input1.fas input2.fas input3.fas >output.fas
fasta_pretty input*.fas >output.fas
cat input2.fas | fasta_pretty input1.fas - input3.fas >output.fas
A tool to format FASTA files to remove gap character states and format to uniform column width (60).
fasta_dealign [OPTIONS] [FILE]...
-
-h | --help
Print the help message; ignore other arguments.
STDIN and/or FASTA files. The extention of the files is irrelevant.
The output is FASTA format with 60 line length for the sequence.
The program prints to STDOUT.
This can be captured in a file by using the > or >> operator.
Format a single file (input.fas) and save it to a file
(output.fas).
fasta_dealign input.fas >output.fas
cat input.fas | fasta_dealign >output.fas
cat input.fas | fasta_dealign - >output.fas
Format and concatenate three FASTA files from the current directory
(input1.fas, input2.fas and input3.fas) and save it to a file
(output.fas).
fasta_dealign input1.fas input2.fas input3.fas >output.fas
fasta_dealign input*.fas >output.fas
cat input2.fas | fasta_dealign input1.fas - input3.fas >output.fas
A tool to calulate assembly statistics for FASTA files.
It calculates the following statistics:
- number of contigs
- total size (bp)
- N50 (bp)
- L50: smallest number of contigs whose length sum produces N50
- mean contig size (bp)
- longest contig (bp)
- third quartile (bp)
- median (bp)
- first quartile (bp)
- shortest contig (bp)
- number of Ns
- number of gaps (/N+/): number of N-stretches in the sequences
- number of other IUPACs: IUPAC bases are nucleotide ambiguity codes (YRWSKMDVHB)
A tool to display sequence alignments in either pairwise or a multiple alignment fashion.
Usage:
fasta_display_alignment [-h | --help] [-w=<int> | --width=<int>] [-p | --pairwise] [FASTA file | -]
Description:
A tool to display sequence alignments in either pairwise or a multiple alignment fashion.
Options:
-h | --help
Print the help message; ignore other arguments.
-w=<int> | --width=<int>
Set the width of the sequence that will be displayed per line to <int>.
-p | --pairwise
Display alignments in pairs similarly to an exonerate output.
At the start the sequence IDs are printed and some alignment statistics.
-b | --block
Display alignment in 'block format'. Consensus positions are shown with a '*'.
At the start the sequence IDs are printed and at the end some alignment statistics.
Each alignment chunck starts by the range displayed and ends with a line with position info.
This is the default option.