workflow4metabolomics
diff --git a/‎topics/assembly/tutorials/post-assembly-quality-control/tutorial.md‎
Lines changed: 86 additions & 69 deletions b/‎topics/assembly/tutorials/post-assembly-quality-control/tutorial.md‎
Lines changed: 86 additions & 69 deletions
@@ -4,15 +4,11 @@ layout: tutorial_hands_on
 title: Post Assembly Quality Control
 zenodo_link: ''
 questions:
-- Which biological questions are addressed by the tutorial?
-- Which bioinformatics techniques are important to know for this type of data?
+- what combination of tools can control the quality of an initial assembly?
+- how to evaluate the quality and the completeness of the assemblies?
 objectives:
-- The learning objectives are the goals of the tutorial
-- They will be informed by your audience and will communicate to them and to yourself
-  what you should focus on during the course
-- They are single sentences describing what a learner should be able to do once they
-  have completed the tutorial
-- You can use Bloom's Taxonomy to write effective learning objectives
+- apply the post-assembly-QC-workflow using the necessary tools
+- evaluate the quality of the post-assembly
 time_estimation: 3H
 key_points:
 - The take-home messages
@@ -28,6 +24,18 @@ contributors:
 
 <!-- This is a comment. -->
 
+An important part in genome assembly is quality control. Since there are many different 
+ways how errors can occur there are also many different tools to identify and remove 
+potential problems. The difficulty is to choose between them and to know when it is time
+to move on. It is important because time and resources play a big role in genome assembly.
+
+In this tutorial you will learn how to use the tools for the post-assembly quality control
+workflow. It's a post assembly pipeline from ERGA to ensure high quality assemblies in
+appropriate time and resources.
+
+
+
+
 General introduction about the topic and then an introduction of the
 tutorial (the questions and the objectives). It is nice also to have a
 scheme to sum up the pipeline used during the tutorial. The idea is to
@@ -80,7 +88,7 @@ depending on the specifics of your tutorial.
 
 have fun!
 
-## Get data
+# Get data
 
 > <hands-on-title> Data Upload </hands-on-title>
 >
@@ -111,7 +119,13 @@ have fun!
 >
 {: .hands_on}
 
-# Title of the section usually corresponding to a big step in the analysis
+# Assembly decontamination
+
+Extracted DNA from an organism contains always also DNA from other organisms.
+This is why most assemblies need to go through an decontamination process to remove 
+the non-target reads/contigs for a higher-quality end product.
+
+
 
 It comes first a description of the step: some background and some theory.
 Some image can be added there to support the theory explanation:
@@ -131,25 +145,19 @@ The idea is to keep the theory description before quite simple to focus more on
 A big step can have several subsections or sub steps:
 
 
-## Sub-step with **HISAT2**
+## Sub-step with **BlobToolKit**
 
-> <hands-on-title> Task description </hands-on-title>
+Blobtoolkit is a decontamination tool. The first step is to create a new dataset.
+Therefor the tool takes some inputs and then creates the so called BlobDir datastructure as an output.
+
+> <hands-on-title> Creating the BlobDir dataset </hands-on-title>
 >
-> 1. {% tool [HISAT2](toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1) %} with the following parameters:
->    - *"Source for the reference genome"*: `Use a genome from history`
->        - {% icon param-file %} *"Select the reference genome"*: `output` (Input dataset)
->    - *"Is this a single or paired library"*: `Paired-end Dataset Collection`
->        - {% icon param-collection %} *"Paired Collection"*: `output` (Input dataset collection)
->        - *"Paired-end options"*: `Use default values`
->    - In *"Advanced Options"*:
->        - *"Input options"*: `Use default values`
->        - *"Alignment options"*: `Use default values`
->        - *"Scoring options"*: `Use default values`
->        - *"Spliced alignment options"*: `Use default values`
->        - *"Reporting options"*: `Use default values`
->        - *"Output options"*: `Use default values`
->        - *"SAM options"*: `Use default values`
->        - *"Other options"*: `Use default values`
+> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
+>    - *"Select mode"*: `Create a BlobToolKit dataset`
+>        - {% icon param-file %} *"Genome assembly file"*: `output` (Input dataset)
+>        - {% icon param-file %} *"Metadata file"*: `output` (Input dataset)
+>        - *"NCBI taxonomy ID"*: `{'id': 2, 'output_name': 'output'}`
+>        - {% icon param-file %} *"NCBI taxdump directory"*: `output` (Input dataset)
 >
 >    ***TODO***: *Check parameter descriptions*
 >
@@ -178,15 +186,25 @@ A big step can have several subsections or sub steps:
 >
 {: .question}
 
-## Sub-step with **gfastats**
+
+## Sub-step with **HISAT2**
 
 > <hands-on-title> Task description </hands-on-title>
 >
-> 1. {% tool [gfastats](toolshed.g2.bx.psu.edu/repos/bgruening/gfastats/gfastats/1.2.0+galaxy0) %} with the following parameters:
->    - {% icon param-file %} *"Input file"*: `output` (Input dataset)
->    - *"Specify target sequences"*: `Disabled`
->    - *"Tool mode"*: `Summary statistics generation`
->        - *"Report mode"*: `Genome assembly statistics (--nstar-report)`
+> 1. {% tool [HISAT2](toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1) %} with the following parameters:
+>    - *"Source for the reference genome"*: `Use a genome from history`
+>        - {% icon param-file %} *"Select the reference genome"*: `output` (Input dataset)
+>    - *"Is this a single or paired library"*: `Single-end`
+>        - {% icon param-collection %} *"FASTA/Q file"*: `output` (Input dataset collection)
+>    - In *"Advanced Options"*:
+>        - *"Input options"*: `Use default values`
+>        - *"Alignment options"*: `Use default values`
+>        - *"Scoring options"*: `Use default values`
+>        - *"Spliced alignment options"*: `Use default values`
+>        - *"Reporting options"*: `Use default values`
+>        - *"Output options"*: `Use default values`
+>        - *"SAM options"*: `Use default values`
+>        - *"Other options"*: `Use default values`
 >
 >    ***TODO***: *Check parameter descriptions*
 >
@@ -258,11 +276,12 @@ A big step can have several subsections or sub steps:
 > <hands-on-title> Task description </hands-on-title>
 >
 > 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
->    - *"Select mode"*: `Create a BlobToolKit dataset`
->        - {% icon param-file %} *"Genome assembly file"*: `output` (Input dataset)
->        - {% icon param-file %} *"Metadata file"*: `output` (Input dataset)
->        - *"NCBI taxonomy ID"*: `{'id': 2, 'output_name': 'output'}`
->        - {% icon param-file %} *"NCBI taxdump directory"*: `output` (Input dataset)
+>    - *"Select mode"*: `Add data to a BlobToolKit dataset`
+>        - {% icon param-file %} *"Blobdir.tgz file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
+>        - {% icon param-file %} *"BUSCO full table file"*: `busco_table` (output of **Busco** {% icon tool %})
+>        - *"BLAST/Diamond hits"*: `Disabled`
+>        - {% icon param-file %} *"BAM/SAM/CRAM read alignment file"*: `output_alignments` (output of **HISAT2** {% icon tool %})
+>        - *"Genetic text file"*: `Disabled`
 >
 >    ***TODO***: *Check parameter descriptions*
 >
@@ -291,15 +310,13 @@ A big step can have several subsections or sub steps:
 >
 {: .question}
 
-## Sub-step with **Meryl**
+## Sub-step with **BlobToolKit**
 
 > <hands-on-title> Task description </hands-on-title>
 >
-> 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
->    - *"Operation type selector"*: `Count operations`
->        - {% icon param-file %} *"Input sequences"*: `output` (Input dataset)
->        - *"K-mer size selector"*: `Estimate the best k-mer size`
->            - *"Genome size"*: `{'id': 4, 'output_name': 'output'}`
+> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
+>    - *"Select mode"*: `Generate plots`
+>        - {% icon param-file %} *"Blobdir file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
 >
 >    ***TODO***: *Check parameter descriptions*
 >
@@ -328,17 +345,15 @@ A big step can have several subsections or sub steps:
 >
 {: .question}
 
-## Sub-step with **BlobToolKit**
+## Sub-step with **Meryl**
 
 > <hands-on-title> Task description </hands-on-title>
 >
-> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
->    - *"Select mode"*: `Add data to a BlobToolKit dataset`
->        - {% icon param-file %} *"Blobdir.tgz file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
->        - {% icon param-file %} *"BUSCO full table file"*: `busco_table` (output of **Busco** {% icon tool %})
->        - *"BLAST/Diamond hits"*: `Disabled`
->        - {% icon param-file %} *"BAM/SAM/CRAM read alignment file"*: `output_alignments` (output of **HISAT2** {% icon tool %})
->        - *"Genetic text file"*: `Disabled`
+> 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
+>    - *"Operation type selector"*: `Count operations`
+>        - {% icon param-file %} *"Input sequences"*: `output` (Input dataset)
+>        - *"K-mer size selector"*: `Estimate the best k-mer size`
+>            - *"Genome size"*: `{'id': 4, 'output_name': 'output'}`
 >
 >    ***TODO***: *Check parameter descriptions*
 >
@@ -367,13 +382,13 @@ A big step can have several subsections or sub steps:
 >
 {: .question}
 
-## Sub-step with **BlobToolKit**
+## Sub-step with **Meryl**
 
 > <hands-on-title> Task description </hands-on-title>
 >
-> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
->    - *"Select mode"*: `Generate plots`
->        - {% icon param-file %} *"Blobdir file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
+> 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
+>    - *"Operation type selector"*: `Generate histogram dataset`
+>        - {% icon param-file %} *"Input meryldb"*: `read_db` (output of **Meryl** {% icon tool %})
 >
 >    ***TODO***: *Check parameter descriptions*
 >
@@ -402,15 +417,12 @@ A big step can have several subsections or sub steps:
 >
 {: .question}
 
-## Sub-step with **Merqury**
+## Sub-step with **GenomeScope**
 
 > <hands-on-title> Task description </hands-on-title>
 >
-> 1. {% tool [Merqury](toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3+galaxy2) %} with the following parameters:
->    - *"Evaluation mode"*: `Default mode`
->        - {% icon param-file %} *"K-mer counts database"*: `read_db` (output of **Meryl** {% icon tool %})
->        - *"Number of assemblies"*: `One assembly (pseudo-haplotype or mixed-haplotype)`
->            - {% icon param-file %} *"Genome assembly"*: `output` (Input dataset)
+> 1. {% tool [GenomeScope](toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0+galaxy2) %} with the following parameters:
+>    - {% icon param-file %} *"Input histogram file"*: `read_db_hist` (output of **Meryl** {% icon tool %})
 >
 >    ***TODO***: *Check parameter descriptions*
 >
@@ -439,13 +451,15 @@ A big step can have several subsections or sub steps:
 >
 {: .question}
 
-## Sub-step with **Meryl**
+## Sub-step with **Merqury**
 
 > <hands-on-title> Task description </hands-on-title>
 >
-> 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
->    - *"Operation type selector"*: `Generate histogram dataset`
->        - {% icon param-file %} *"Input meryldb"*: `read_db` (output of **Meryl** {% icon tool %})
+> 1. {% tool [Merqury](toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3+galaxy2) %} with the following parameters:
+>    - *"Evaluation mode"*: `Default mode`
+>        - {% icon param-file %} *"K-mer counts database"*: `read_db` (output of **Meryl** {% icon tool %})
+>        - *"Number of assemblies"*: `One assembly (pseudo-haplotype or mixed-haplotype)`
+>            - {% icon param-file %} *"Genome assembly"*: `output` (Input dataset)
 >
 >    ***TODO***: *Check parameter descriptions*
 >
@@ -474,12 +488,15 @@ A big step can have several subsections or sub steps:
 >
 {: .question}
 
-## Sub-step with **GenomeScope**
+## Sub-step with **gfastats**
 
 > <hands-on-title> Task description </hands-on-title>
 >
-> 1. {% tool [GenomeScope](toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0+galaxy2) %} with the following parameters:
->    - {% icon param-file %} *"Input histogram file"*: `read_db_hist` (output of **Meryl** {% icon tool %})
+> 1. {% tool [gfastats](toolshed.g2.bx.psu.edu/repos/bgruening/gfastats/gfastats/1.2.0+galaxy0) %} with the following parameters:
+>    - {% icon param-file %} *"Input file"*: `output` (Input dataset)
+>    - *"Specify target sequences"*: `Disabled`
+>    - *"Tool mode"*: `Summary statistics generation`
+>        - *"Report mode"*: `Genome assembly statistics (--nstar-report)`
 >
 >    ***TODO***: *Check parameter descriptions*
 >