mTags workflow with Snakemake

The main goal of this pipeline is to accurately assign taxonomy to meta-omics datasets by extracting fragments (tags) belonging to the 18S-V4 region and classifying them with eukaryotesV4 database.

This is an improved implementation of the scripts avaiable in this repository. Here's an schematic view of the pipeline:

How to run this workflow

Step 1: clone this repository

Run this command:

git clone https://github.com/aleixop/mtags_snakemake.git

Step 2: install required software

Required software is pretty common in bioinformatic analyses, so your cluster may already have them all installed. If not, software can be installed either manually or through conda.

Manual installation

The required software for this pipeline is the following:

Conda installation

First of all, if you don't have mamba installed, follow the steps explained here to do so.

Alternatively, you can install mamba like this:

conda activate base
conda install -n base -c conda-forge mamba

Then, activate the conda base environment, enter the mtags_snakemake directory and install all required software into an isolated Conda environment with the name mtags_snakemake via:

conda activate base
cd mtags_snakemake
mamba env create --name mtags_snakemake --file environment.yaml

Step 3: prepare your input files

In case you want to test the pipeline, this repository contains files for testing in data/input/. You can remove them before processing your own samples.

You have to do 2 things prior to running this pipeline. First, put your quality-filtered and adapter-trimmed paired-end fastq files in directory data/input/. These should be gzipped and follow this naming structure:

<sample1>_R1.fastq.gz
<sample1>_R2.fastq.gz
<sample2>_R1.fastq.gz
<sample2>_R2.fastq.gz
...

And second, open samples.txt file, remove its contents (samples written here are for testing only) and write your own sample names (one line per sample name). This should have the following structure:

sample1
sample2
sample3
...

Step 4: run the pipeline

With manual installation

Load all the required software or make sure that paths for software are exported and run this code from the root of the project (where the Snakefile is located). You can write the number of threads you want to use with --cores:

snakemake --cores <threads>

With conda installation

Activate the environment you created in Step 2:

conda activate mtags_snakemake

And run this code from the root of the project (where the Snakefile is located). You can write the number of threads you want to use with --cores:

snakemake --cores <threads>

Running the pipeline in SLURM

If you are running the pipeline on an HPC with SLURM, you can find an example script tu run this pipeline here.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
environment.yaml		environment.yaml
pipeline.png		pipeline.png
samples.txt		samples.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mTags workflow with Snakemake

How to run this workflow

Step 1: clone this repository

Step 2: install required software

Manual installation

Conda installation

Step 3: prepare your input files

Step 4: run the pipeline

With manual installation

With conda installation

Running the pipeline in SLURM

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mTags workflow with Snakemake

How to run this workflow

Step 1: clone this repository

Step 2: install required software

Manual installation

Conda installation

Step 3: prepare your input files

Step 4: run the pipeline

With manual installation

With conda installation

Running the pipeline in SLURM

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages