Skip to content

This package contains Perl programs/scripts that perform frequently needed operations on FASTA format files.

License

Notifications You must be signed in to change notification settings

b-brankovics/fasta_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FASTA-tools

This package contains Perl programs/scripts that perform frequently needed operations on FASTA format files. Such as adjusting the line length to a uniform length, reverse complementing sequences, identifying entries with identical sequences, etc.

The executable files are located in the bin folder.


Programs:

A tool to reverse complement the entries of FASTA format files.

A tool to get sequence length information for FASTA files.

A tool to filter entries from a multi FASTA file based on entry names using regular expression(s). (Replaces fasta_extract and fasta_remove)

A tool to sort entries from a multi FASTA file based on entry names using regular expression(s).

A tool to measure sequence similarity of aligned sequences in multi FASTA format files.

A tool to measure sequence variability of aligned sequences in multi FASTA format files.

A tool to extract a part of the sequences from FASTA files.

A tool to shift circular FASTA sequences using a reference FASTA file or a position.

A tool to reporting exact sequence matches of entries in a reference FASTA file in a target FASTA file.

Example:

fasta_find gene.fas chr.fas

A tool to remove duplicate sequence from FASTA format files and print the groups to STDERR.

Example:

fasta_unique input.fas >unique.fas 2>unique.tab

A tool to format FASTA file to replace names from duplicate removal by fasta_unique. Using the produced fasta file and the names table.

Example:

 fasta_deunique -i unique.fas -tab unique.tab >deunique.fas

fasta_pretty

A tool to format FASTA files to uniform column width (60).

Synopsis

fasta_pretty [OPTIONS] [FILE]...

Options

  • -h | --help

    Print the help message; ignore other arguments.

Input

STDIN and/or FASTA files. The extention of the files is irrelevant.

Output

The output is FASTA format with 60 line length for the sequence. The program prints to STDOUT. This can be captured in a file by using the > or >> operator.

Examples

Format a single file (input.fas) and save it to a file (output.fas).

fasta_pretty input.fas >output.fas
cat input.fas | fasta_pretty >output.fas
cat input.fas | fasta_pretty - >output.fas

Format and concatenate three FASTA files from the current directory (input1.fas, input2.fas and input3.fas) and save it to a file (output.fas).

fasta_pretty input1.fas input2.fas input3.fas >output.fas
fasta_pretty input*.fas >output.fas
cat input2.fas | fasta_pretty input1.fas - input3.fas >output.fas

fasta_dealign

A tool to format FASTA files to remove gap character states and format to uniform column width (60).

Synopsis

fasta_dealign [OPTIONS] [FILE]...

Options

  • -h | --help

    Print the help message; ignore other arguments.

Input

STDIN and/or FASTA files. The extention of the files is irrelevant.

Output

The output is FASTA format with 60 line length for the sequence. The program prints to STDOUT. This can be captured in a file by using the > or >> operator.

Examples

Format a single file (input.fas) and save it to a file (output.fas).

fasta_dealign input.fas >output.fas
cat input.fas | fasta_dealign >output.fas
cat input.fas | fasta_dealign - >output.fas

Format and concatenate three FASTA files from the current directory (input1.fas, input2.fas and input3.fas) and save it to a file (output.fas).

fasta_dealign input1.fas input2.fas input3.fas >output.fas
fasta_dealign input*.fas >output.fas
cat input2.fas | fasta_dealign input1.fas - input3.fas >output.fas

A tool to calulate assembly statistics for FASTA files.

It calculates the following statistics:

  • number of contigs
  • total size (bp)
  • N50 (bp)
  • L50: smallest number of contigs whose length sum produces N50
  • mean contig size (bp)
  • longest contig (bp)
  • third quartile (bp)
  • median (bp)
  • first quartile (bp)
  • shortest contig (bp)
  • number of Ns
  • number of gaps (/N+/): number of N-stretches in the sequences
  • number of other IUPACs: IUPAC bases are nucleotide ambiguity codes (YRWSKMDVHB)

A tool to display sequence alignments in either pairwise or a multiple alignment fashion.

Usage:
        fasta_display_alignment [-h | --help] [-w=<int> | --width=<int>] [-p | --pairwise] [FASTA file | -]

Description:
        A tool to display sequence alignments in either pairwise or a multiple alignment fashion.

Options:
        -h | --help
                Print the help message; ignore other arguments.
        -w=<int> | --width=<int>
                Set the width of the sequence that will be displayed per line to <int>.
        -p | --pairwise
                Display alignments in pairs similarly to an exonerate output.
                At the start the sequence IDs are printed and some alignment statistics.
        -b | --block
                Display alignment in 'block format'. Consensus positions are shown with a '*'.
                At the start the sequence IDs are printed and at the end some alignment statistics.
                Each alignment chunck starts by the range displayed and ends with a line with position info.
                This is the default option.

About

This package contains Perl programs/scripts that perform frequently needed operations on FASTA format files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published