Skip to content

aleixop/mtags_snakemake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mTags workflow with Snakemake

The main goal of this pipeline is to accurately assign taxonomy to meta-omics datasets by extracting fragments (tags) belonging to the 18S-V4 region and classifying them with eukaryotesV4 database.

This is an improved implementation of the scripts avaiable in this repository. Here's an schematic view of the pipeline:

alt text

How to run this workflow

Step 1: clone this repository

Run this command:

git clone https://github.com/aleixop/mtags_snakemake.git

Step 2: install required software

Required software is pretty common in bioinformatic analyses, so your cluster may already have them all installed. If not, software can be installed either manually or through conda.

Manual installation

The required software for this pipeline is the following:

Conda installation

First of all, if you don't have mamba installed, follow the steps explained here to do so.

Alternatively, you can install mamba like this:

conda activate base
conda install -n base -c conda-forge mamba

Then, activate the conda base environment, enter the mtags_snakemake directory and install all required software into an isolated Conda environment with the name mtags_snakemake via:

conda activate base
cd mtags_snakemake
mamba env create --name mtags_snakemake --file environment.yaml

Step 3: prepare your input files

In case you want to test the pipeline, this repository contains files for testing in data/input/. You can remove them before processing your own samples.

You have to do 2 things prior to running this pipeline. First, put your quality-filtered and adapter-trimmed paired-end fastq files in directory data/input/. These should be gzipped and follow this naming structure:

<sample1>_R1.fastq.gz
<sample1>_R2.fastq.gz
<sample2>_R1.fastq.gz
<sample2>_R2.fastq.gz
...

And second, open samples.txt file, remove its contents (samples written here are for testing only) and write your own sample names (one line per sample name). This should have the following structure:

sample1
sample2
sample3
...

Step 4: run the pipeline

With manual installation

Load all the required software or make sure that paths for software are exported and run this code from the root of the project (where the Snakefile is located). You can write the number of threads you want to use with --cores:

snakemake --cores <threads>

With conda installation

Activate the environment you created in Step 2:

conda activate mtags_snakemake

And run this code from the root of the project (where the Snakefile is located). You can write the number of threads you want to use with --cores:

snakemake --cores <threads>

Running the pipeline in SLURM

If you are running the pipeline on an HPC with SLURM, you can find an example script tu run this pipeline here.

About

Snakemake workflow for mTags extraction and classification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors