The main goal of this pipeline is to accurately assign taxonomy to meta-omics datasets by extracting fragments (tags) belonging to the 18S-V4 region and classifying them with eukaryotesV4 database.
This is an improved implementation of the scripts avaiable in this repository. Here's an schematic view of the pipeline:
Run this command:
git clone https://github.com/aleixop/mtags_snakemake.git
Required software is pretty common in bioinformatic analyses, so your cluster may already have them all installed. If not, software can be installed either manually or through conda.
The required software for this pipeline is the following:
First of all, if you don't have mamba installed, follow the steps explained here to do so.
Alternatively, you can install mamba like this:
conda activate base
conda install -n base -c conda-forge mamba
Then, activate the conda base environment, enter the mtags_snakemake directory and install all required software into an isolated Conda environment with the name mtags_snakemake via:
conda activate base
cd mtags_snakemake
mamba env create --name mtags_snakemake --file environment.yaml
In case you want to test the pipeline, this repository contains files for testing in data/input/. You can remove them before processing your own samples.
You have to do 2 things prior to running this pipeline. First, put your quality-filtered and adapter-trimmed paired-end fastq files in directory data/input/. These should be gzipped and follow this naming structure:
<sample1>_R1.fastq.gz
<sample1>_R2.fastq.gz
<sample2>_R1.fastq.gz
<sample2>_R2.fastq.gz
...
And second, open samples.txt file, remove its contents (samples written here are for testing only) and write your own sample names (one line per sample name). This should have the following structure:
sample1
sample2
sample3
...
Load all the required software or make sure that paths for software are exported and run this code from the root of the project (where the Snakefile is located). You can write the number of threads you want to use with --cores:
snakemake --cores <threads>
Activate the environment you created in Step 2:
conda activate mtags_snakemake
And run this code from the root of the project (where the Snakefile is located). You can write the number of threads you want to use with --cores:
snakemake --cores <threads>
If you are running the pipeline on an HPC with SLURM, you can find an example script tu run this pipeline here.
