Skip to content

Commit a78709d

Browse files
authored
Merge pull request #2 from GitFab93/post-assembly-quality-control
Post assembly quality control new workflow so tutorial adjustment
2 parents 2b42e5d + 49a138a commit a78709d

File tree

2 files changed

+87
-70
lines changed

2 files changed

+87
-70
lines changed

topics/assembly/tutorials/post-assembly-quality-control/tutorial.md

Lines changed: 86 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,11 @@ layout: tutorial_hands_on
44
title: Post Assembly Quality Control
55
zenodo_link: ''
66
questions:
7-
- Which biological questions are addressed by the tutorial?
8-
- Which bioinformatics techniques are important to know for this type of data?
7+
- what combination of tools can control the quality of an initial assembly?
8+
- how to evaluate the quality and the completeness of the assemblies?
99
objectives:
10-
- The learning objectives are the goals of the tutorial
11-
- They will be informed by your audience and will communicate to them and to yourself
12-
what you should focus on during the course
13-
- They are single sentences describing what a learner should be able to do once they
14-
have completed the tutorial
15-
- You can use Bloom's Taxonomy to write effective learning objectives
10+
- apply the post-assembly-QC-workflow using the necessary tools
11+
- evaluate the quality of the post-assembly
1612
time_estimation: 3H
1713
key_points:
1814
- The take-home messages
@@ -28,6 +24,18 @@ contributors:
2824

2925
<!-- This is a comment. -->
3026

27+
An important part in genome assembly is quality control. Since there are many different
28+
ways how errors can occur there are also many different tools to identify and remove
29+
potential problems. The difficulty is to choose between them and to know when it is time
30+
to move on. It is important because time and resources play a big role in genome assembly.
31+
32+
In this tutorial you will learn how to use the tools for the post-assembly quality control
33+
workflow. It's a post assembly pipeline from ERGA to ensure high quality assemblies in
34+
appropriate time and resources.
35+
36+
37+
38+
3139
General introduction about the topic and then an introduction of the
3240
tutorial (the questions and the objectives). It is nice also to have a
3341
scheme to sum up the pipeline used during the tutorial. The idea is to
@@ -80,7 +88,7 @@ depending on the specifics of your tutorial.
8088

8189
have fun!
8290

83-
## Get data
91+
# Get data
8492

8593
> <hands-on-title> Data Upload </hands-on-title>
8694
>
@@ -111,7 +119,13 @@ have fun!
111119
>
112120
{: .hands_on}
113121
114-
# Title of the section usually corresponding to a big step in the analysis
122+
# Assembly decontamination
123+
124+
Extracted DNA from an organism contains always also DNA from other organisms.
125+
This is why most assemblies need to go through an decontamination process to remove
126+
the non-target reads/contigs for a higher-quality end product.
127+
128+
115129
116130
It comes first a description of the step: some background and some theory.
117131
Some image can be added there to support the theory explanation:
@@ -131,25 +145,19 @@ The idea is to keep the theory description before quite simple to focus more on
131145
A big step can have several subsections or sub steps:
132146
133147
134-
## Sub-step with **HISAT2**
148+
## Sub-step with **BlobToolKit**
135149
136-
> <hands-on-title> Task description </hands-on-title>
150+
Blobtoolkit is a decontamination tool. The first step is to create a new dataset.
151+
Therefor the tool takes some inputs and then creates the so called BlobDir datastructure as an output.
152+
153+
> <hands-on-title> Creating the BlobDir dataset </hands-on-title>
137154
>
138-
> 1. {% tool [HISAT2](toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1) %} with the following parameters:
139-
> - *"Source for the reference genome"*: `Use a genome from history`
140-
> - {% icon param-file %} *"Select the reference genome"*: `output` (Input dataset)
141-
> - *"Is this a single or paired library"*: `Paired-end Dataset Collection`
142-
> - {% icon param-collection %} *"Paired Collection"*: `output` (Input dataset collection)
143-
> - *"Paired-end options"*: `Use default values`
144-
> - In *"Advanced Options"*:
145-
> - *"Input options"*: `Use default values`
146-
> - *"Alignment options"*: `Use default values`
147-
> - *"Scoring options"*: `Use default values`
148-
> - *"Spliced alignment options"*: `Use default values`
149-
> - *"Reporting options"*: `Use default values`
150-
> - *"Output options"*: `Use default values`
151-
> - *"SAM options"*: `Use default values`
152-
> - *"Other options"*: `Use default values`
155+
> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
156+
> - *"Select mode"*: `Create a BlobToolKit dataset`
157+
> - {% icon param-file %} *"Genome assembly file"*: `output` (Input dataset)
158+
> - {% icon param-file %} *"Metadata file"*: `output` (Input dataset)
159+
> - *"NCBI taxonomy ID"*: `{'id': 2, 'output_name': 'output'}`
160+
> - {% icon param-file %} *"NCBI taxdump directory"*: `output` (Input dataset)
153161
>
154162
> ***TODO***: *Check parameter descriptions*
155163
>
@@ -178,15 +186,25 @@ A big step can have several subsections or sub steps:
178186
>
179187
{: .question}
180188
181-
## Sub-step with **gfastats**
189+
190+
## Sub-step with **HISAT2**
182191
183192
> <hands-on-title> Task description </hands-on-title>
184193
>
185-
> 1. {% tool [gfastats](toolshed.g2.bx.psu.edu/repos/bgruening/gfastats/gfastats/1.2.0+galaxy0) %} with the following parameters:
186-
> - {% icon param-file %} *"Input file"*: `output` (Input dataset)
187-
> - *"Specify target sequences"*: `Disabled`
188-
> - *"Tool mode"*: `Summary statistics generation`
189-
> - *"Report mode"*: `Genome assembly statistics (--nstar-report)`
194+
> 1. {% tool [HISAT2](toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1) %} with the following parameters:
195+
> - *"Source for the reference genome"*: `Use a genome from history`
196+
> - {% icon param-file %} *"Select the reference genome"*: `output` (Input dataset)
197+
> - *"Is this a single or paired library"*: `Single-end`
198+
> - {% icon param-collection %} *"FASTA/Q file"*: `output` (Input dataset collection)
199+
> - In *"Advanced Options"*:
200+
> - *"Input options"*: `Use default values`
201+
> - *"Alignment options"*: `Use default values`
202+
> - *"Scoring options"*: `Use default values`
203+
> - *"Spliced alignment options"*: `Use default values`
204+
> - *"Reporting options"*: `Use default values`
205+
> - *"Output options"*: `Use default values`
206+
> - *"SAM options"*: `Use default values`
207+
> - *"Other options"*: `Use default values`
190208
>
191209
> ***TODO***: *Check parameter descriptions*
192210
>
@@ -258,11 +276,12 @@ A big step can have several subsections or sub steps:
258276
> <hands-on-title> Task description </hands-on-title>
259277
>
260278
> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
261-
> - *"Select mode"*: `Create a BlobToolKit dataset`
262-
> - {% icon param-file %} *"Genome assembly file"*: `output` (Input dataset)
263-
> - {% icon param-file %} *"Metadata file"*: `output` (Input dataset)
264-
> - *"NCBI taxonomy ID"*: `{'id': 2, 'output_name': 'output'}`
265-
> - {% icon param-file %} *"NCBI taxdump directory"*: `output` (Input dataset)
279+
> - *"Select mode"*: `Add data to a BlobToolKit dataset`
280+
> - {% icon param-file %} *"Blobdir.tgz file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
281+
> - {% icon param-file %} *"BUSCO full table file"*: `busco_table` (output of **Busco** {% icon tool %})
282+
> - *"BLAST/Diamond hits"*: `Disabled`
283+
> - {% icon param-file %} *"BAM/SAM/CRAM read alignment file"*: `output_alignments` (output of **HISAT2** {% icon tool %})
284+
> - *"Genetic text file"*: `Disabled`
266285
>
267286
> ***TODO***: *Check parameter descriptions*
268287
>
@@ -291,15 +310,13 @@ A big step can have several subsections or sub steps:
291310
>
292311
{: .question}
293312
294-
## Sub-step with **Meryl**
313+
## Sub-step with **BlobToolKit**
295314
296315
> <hands-on-title> Task description </hands-on-title>
297316
>
298-
> 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
299-
> - *"Operation type selector"*: `Count operations`
300-
> - {% icon param-file %} *"Input sequences"*: `output` (Input dataset)
301-
> - *"K-mer size selector"*: `Estimate the best k-mer size`
302-
> - *"Genome size"*: `{'id': 4, 'output_name': 'output'}`
317+
> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
318+
> - *"Select mode"*: `Generate plots`
319+
> - {% icon param-file %} *"Blobdir file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
303320
>
304321
> ***TODO***: *Check parameter descriptions*
305322
>
@@ -328,17 +345,15 @@ A big step can have several subsections or sub steps:
328345
>
329346
{: .question}
330347
331-
## Sub-step with **BlobToolKit**
348+
## Sub-step with **Meryl**
332349
333350
> <hands-on-title> Task description </hands-on-title>
334351
>
335-
> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
336-
> - *"Select mode"*: `Add data to a BlobToolKit dataset`
337-
> - {% icon param-file %} *"Blobdir.tgz file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
338-
> - {% icon param-file %} *"BUSCO full table file"*: `busco_table` (output of **Busco** {% icon tool %})
339-
> - *"BLAST/Diamond hits"*: `Disabled`
340-
> - {% icon param-file %} *"BAM/SAM/CRAM read alignment file"*: `output_alignments` (output of **HISAT2** {% icon tool %})
341-
> - *"Genetic text file"*: `Disabled`
352+
> 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
353+
> - *"Operation type selector"*: `Count operations`
354+
> - {% icon param-file %} *"Input sequences"*: `output` (Input dataset)
355+
> - *"K-mer size selector"*: `Estimate the best k-mer size`
356+
> - *"Genome size"*: `{'id': 4, 'output_name': 'output'}`
342357
>
343358
> ***TODO***: *Check parameter descriptions*
344359
>
@@ -367,13 +382,13 @@ A big step can have several subsections or sub steps:
367382
>
368383
{: .question}
369384
370-
## Sub-step with **BlobToolKit**
385+
## Sub-step with **Meryl**
371386
372387
> <hands-on-title> Task description </hands-on-title>
373388
>
374-
> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
375-
> - *"Select mode"*: `Generate plots`
376-
> - {% icon param-file %} *"Blobdir file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
389+
> 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
390+
> - *"Operation type selector"*: `Generate histogram dataset`
391+
> - {% icon param-file %} *"Input meryldb"*: `read_db` (output of **Meryl** {% icon tool %})
377392
>
378393
> ***TODO***: *Check parameter descriptions*
379394
>
@@ -402,15 +417,12 @@ A big step can have several subsections or sub steps:
402417
>
403418
{: .question}
404419
405-
## Sub-step with **Merqury**
420+
## Sub-step with **GenomeScope**
406421
407422
> <hands-on-title> Task description </hands-on-title>
408423
>
409-
> 1. {% tool [Merqury](toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3+galaxy2) %} with the following parameters:
410-
> - *"Evaluation mode"*: `Default mode`
411-
> - {% icon param-file %} *"K-mer counts database"*: `read_db` (output of **Meryl** {% icon tool %})
412-
> - *"Number of assemblies"*: `One assembly (pseudo-haplotype or mixed-haplotype)`
413-
> - {% icon param-file %} *"Genome assembly"*: `output` (Input dataset)
424+
> 1. {% tool [GenomeScope](toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0+galaxy2) %} with the following parameters:
425+
> - {% icon param-file %} *"Input histogram file"*: `read_db_hist` (output of **Meryl** {% icon tool %})
414426
>
415427
> ***TODO***: *Check parameter descriptions*
416428
>
@@ -439,13 +451,15 @@ A big step can have several subsections or sub steps:
439451
>
440452
{: .question}
441453
442-
## Sub-step with **Meryl**
454+
## Sub-step with **Merqury**
443455
444456
> <hands-on-title> Task description </hands-on-title>
445457
>
446-
> 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
447-
> - *"Operation type selector"*: `Generate histogram dataset`
448-
> - {% icon param-file %} *"Input meryldb"*: `read_db` (output of **Meryl** {% icon tool %})
458+
> 1. {% tool [Merqury](toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3+galaxy2) %} with the following parameters:
459+
> - *"Evaluation mode"*: `Default mode`
460+
> - {% icon param-file %} *"K-mer counts database"*: `read_db` (output of **Meryl** {% icon tool %})
461+
> - *"Number of assemblies"*: `One assembly (pseudo-haplotype or mixed-haplotype)`
462+
> - {% icon param-file %} *"Genome assembly"*: `output` (Input dataset)
449463
>
450464
> ***TODO***: *Check parameter descriptions*
451465
>
@@ -474,12 +488,15 @@ A big step can have several subsections or sub steps:
474488
>
475489
{: .question}
476490
477-
## Sub-step with **GenomeScope**
491+
## Sub-step with **gfastats**
478492
479493
> <hands-on-title> Task description </hands-on-title>
480494
>
481-
> 1. {% tool [GenomeScope](toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0+galaxy2) %} with the following parameters:
482-
> - {% icon param-file %} *"Input histogram file"*: `read_db_hist` (output of **Meryl** {% icon tool %})
495+
> 1. {% tool [gfastats](toolshed.g2.bx.psu.edu/repos/bgruening/gfastats/gfastats/1.2.0+galaxy0) %} with the following parameters:
496+
> - {% icon param-file %} *"Input file"*: `output` (Input dataset)
497+
> - *"Specify target sequences"*: `Disabled`
498+
> - *"Tool mode"*: `Summary statistics generation`
499+
> - *"Report mode"*: `Genome assembly statistics (--nstar-report)`
483500
>
484501
> ***TODO***: *Check parameter descriptions*
485502
>

0 commit comments

Comments
 (0)