@@ -4,15 +4,11 @@ layout: tutorial_hands_on
4
4
title : Post Assembly Quality Control
5
5
zenodo_link : ' '
6
6
questions :
7
- - Which biological questions are addressed by the tutorial ?
8
- - Which bioinformatics techniques are important to know for this type of data ?
7
+ - what combination of tools can control the quality of an initial assembly ?
8
+ - how to evaluate the quality and the completeness of the assemblies ?
9
9
objectives :
10
- - The learning objectives are the goals of the tutorial
11
- - They will be informed by your audience and will communicate to them and to yourself
12
- what you should focus on during the course
13
- - They are single sentences describing what a learner should be able to do once they
14
- have completed the tutorial
15
- - You can use Bloom's Taxonomy to write effective learning objectives
10
+ - apply the post-assembly-QC-workflow using the necessary tools
11
+ - evaluate the quality of the post-assembly
16
12
time_estimation : 3H
17
13
key_points :
18
14
- The take-home messages
@@ -28,6 +24,18 @@ contributors:
28
24
29
25
<!-- This is a comment. -->
30
26
27
+ An important part in genome assembly is quality control. Since there are many different
28
+ ways how errors can occur there are also many different tools to identify and remove
29
+ potential problems. The difficulty is to choose between them and to know when it is time
30
+ to move on. It is important because time and resources play a big role in genome assembly.
31
+
32
+ In this tutorial you will learn how to use the tools for the post-assembly quality control
33
+ workflow. It's a post assembly pipeline from ERGA to ensure high quality assemblies in
34
+ appropriate time and resources.
35
+
36
+
37
+
38
+
31
39
General introduction about the topic and then an introduction of the
32
40
tutorial (the questions and the objectives). It is nice also to have a
33
41
scheme to sum up the pipeline used during the tutorial. The idea is to
@@ -80,7 +88,7 @@ depending on the specifics of your tutorial.
80
88
81
89
have fun!
82
90
83
- ## Get data
91
+ # Get data
84
92
85
93
> <hands-on-title > Data Upload </hands-on-title >
86
94
>
@@ -111,7 +119,13 @@ have fun!
111
119
>
112
120
{: .hands_on}
113
121
114
- # Title of the section usually corresponding to a big step in the analysis
122
+ # Assembly decontamination
123
+
124
+ Extracted DNA from an organism contains always also DNA from other organisms.
125
+ This is why most assemblies need to go through an decontamination process to remove
126
+ the non-target reads/contigs for a higher-quality end product.
127
+
128
+
115
129
116
130
It comes first a description of the step: some background and some theory.
117
131
Some image can be added there to support the theory explanation:
@@ -131,25 +145,19 @@ The idea is to keep the theory description before quite simple to focus more on
131
145
A big step can have several subsections or sub steps:
132
146
133
147
134
- ## Sub-step with **HISAT2 **
148
+ ## Sub-step with **BlobToolKit **
135
149
136
- > <hands-on-title> Task description </hands-on-title>
150
+ Blobtoolkit is a decontamination tool. The first step is to create a new dataset.
151
+ Therefor the tool takes some inputs and then creates the so called BlobDir datastructure as an output.
152
+
153
+ > <hands-on-title> Creating the BlobDir dataset </hands-on-title>
137
154
>
138
- > 1. {% tool [HISAT2](toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1) %} with the following parameters:
139
- > - *"Source for the reference genome"*: `Use a genome from history`
140
- > - {% icon param-file %} *"Select the reference genome"*: `output` (Input dataset)
141
- > - *"Is this a single or paired library"*: `Paired-end Dataset Collection`
142
- > - {% icon param-collection %} *"Paired Collection"*: `output` (Input dataset collection)
143
- > - *"Paired-end options"*: `Use default values`
144
- > - In *"Advanced Options"*:
145
- > - *"Input options"*: `Use default values`
146
- > - *"Alignment options"*: `Use default values`
147
- > - *"Scoring options"*: `Use default values`
148
- > - *"Spliced alignment options"*: `Use default values`
149
- > - *"Reporting options"*: `Use default values`
150
- > - *"Output options"*: `Use default values`
151
- > - *"SAM options"*: `Use default values`
152
- > - *"Other options"*: `Use default values`
155
+ > 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
156
+ > - *"Select mode"*: `Create a BlobToolKit dataset`
157
+ > - {% icon param-file %} *"Genome assembly file"*: `output` (Input dataset)
158
+ > - {% icon param-file %} *"Metadata file"*: `output` (Input dataset)
159
+ > - *"NCBI taxonomy ID"*: `{'id': 2, 'output_name': 'output'}`
160
+ > - {% icon param-file %} *"NCBI taxdump directory"*: `output` (Input dataset)
153
161
>
154
162
> ***TODO***: *Check parameter descriptions*
155
163
>
@@ -178,15 +186,25 @@ A big step can have several subsections or sub steps:
178
186
>
179
187
{: .question}
180
188
181
- ## Sub-step with **gfastats**
189
+
190
+ ## Sub-step with **HISAT2**
182
191
183
192
> <hands-on-title> Task description </hands-on-title>
184
193
>
185
- > 1. {% tool [gfastats](toolshed.g2.bx.psu.edu/repos/bgruening/gfastats/gfastats/1.2.0+galaxy0) %} with the following parameters:
186
- > - {% icon param-file %} *"Input file"*: `output` (Input dataset)
187
- > - *"Specify target sequences"*: `Disabled`
188
- > - *"Tool mode"*: `Summary statistics generation`
189
- > - *"Report mode"*: `Genome assembly statistics (--nstar-report)`
194
+ > 1. {% tool [HISAT2](toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1) %} with the following parameters:
195
+ > - *"Source for the reference genome"*: `Use a genome from history`
196
+ > - {% icon param-file %} *"Select the reference genome"*: `output` (Input dataset)
197
+ > - *"Is this a single or paired library"*: `Single-end`
198
+ > - {% icon param-collection %} *"FASTA/Q file"*: `output` (Input dataset collection)
199
+ > - In *"Advanced Options"*:
200
+ > - *"Input options"*: `Use default values`
201
+ > - *"Alignment options"*: `Use default values`
202
+ > - *"Scoring options"*: `Use default values`
203
+ > - *"Spliced alignment options"*: `Use default values`
204
+ > - *"Reporting options"*: `Use default values`
205
+ > - *"Output options"*: `Use default values`
206
+ > - *"SAM options"*: `Use default values`
207
+ > - *"Other options"*: `Use default values`
190
208
>
191
209
> ***TODO***: *Check parameter descriptions*
192
210
>
@@ -258,11 +276,12 @@ A big step can have several subsections or sub steps:
258
276
> <hands-on-title> Task description </hands-on-title>
259
277
>
260
278
> 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
261
- > - *"Select mode"*: `Create a BlobToolKit dataset`
262
- > - {% icon param-file %} *"Genome assembly file"*: `output` (Input dataset)
263
- > - {% icon param-file %} *"Metadata file"*: `output` (Input dataset)
264
- > - *"NCBI taxonomy ID"*: `{'id': 2, 'output_name': 'output'}`
265
- > - {% icon param-file %} *"NCBI taxdump directory"*: `output` (Input dataset)
279
+ > - *"Select mode"*: `Add data to a BlobToolKit dataset`
280
+ > - {% icon param-file %} *"Blobdir.tgz file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
281
+ > - {% icon param-file %} *"BUSCO full table file"*: `busco_table` (output of **Busco** {% icon tool %})
282
+ > - *"BLAST/Diamond hits"*: `Disabled`
283
+ > - {% icon param-file %} *"BAM/SAM/CRAM read alignment file"*: `output_alignments` (output of **HISAT2** {% icon tool %})
284
+ > - *"Genetic text file"*: `Disabled`
266
285
>
267
286
> ***TODO***: *Check parameter descriptions*
268
287
>
@@ -291,15 +310,13 @@ A big step can have several subsections or sub steps:
291
310
>
292
311
{: .question}
293
312
294
- ## Sub-step with **Meryl **
313
+ ## Sub-step with **BlobToolKit **
295
314
296
315
> <hands-on-title> Task description </hands-on-title>
297
316
>
298
- > 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
299
- > - *"Operation type selector"*: `Count operations`
300
- > - {% icon param-file %} *"Input sequences"*: `output` (Input dataset)
301
- > - *"K-mer size selector"*: `Estimate the best k-mer size`
302
- > - *"Genome size"*: `{'id': 4, 'output_name': 'output'}`
317
+ > 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
318
+ > - *"Select mode"*: `Generate plots`
319
+ > - {% icon param-file %} *"Blobdir file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
303
320
>
304
321
> ***TODO***: *Check parameter descriptions*
305
322
>
@@ -328,17 +345,15 @@ A big step can have several subsections or sub steps:
328
345
>
329
346
{: .question}
330
347
331
- ## Sub-step with **BlobToolKit **
348
+ ## Sub-step with **Meryl **
332
349
333
350
> <hands-on-title> Task description </hands-on-title>
334
351
>
335
- > 1. {% tool [BlobToolKit](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0) %} with the following parameters:
336
- > - *"Select mode"*: `Add data to a BlobToolKit dataset`
337
- > - {% icon param-file %} *"Blobdir.tgz file"*: `blobdir` (output of **BlobToolKit** {% icon tool %})
338
- > - {% icon param-file %} *"BUSCO full table file"*: `busco_table` (output of **Busco** {% icon tool %})
339
- > - *"BLAST/Diamond hits"*: `Disabled`
340
- > - {% icon param-file %} *"BAM/SAM/CRAM read alignment file"*: `output_alignments` (output of **HISAT2** {% icon tool %})
341
- > - *"Genetic text file"*: `Disabled`
352
+ > 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
353
+ > - *"Operation type selector"*: `Count operations`
354
+ > - {% icon param-file %} *"Input sequences"*: `output` (Input dataset)
355
+ > - *"K-mer size selector"*: `Estimate the best k-mer size`
356
+ > - *"Genome size"*: `{'id': 4, 'output_name': 'output'}`
342
357
>
343
358
> ***TODO***: *Check parameter descriptions*
344
359
>
@@ -367,13 +382,13 @@ A big step can have several subsections or sub steps:
367
382
>
368
383
{: .question}
369
384
370
- ## Sub-step with **BlobToolKit **
385
+ ## Sub-step with **Meryl **
371
386
372
387
> <hands-on-title> Task description </hands-on-title>
373
388
>
374
- > 1. {% tool [BlobToolKit ](toolshed.g2.bx.psu.edu/repos/bgruening/blobtoolkit/blobtoolkit/3.4.0+galaxy0 ) %} with the following parameters:
375
- > - *"Select mode "*: `Generate plots `
376
- > - {% icon param-file %} *"Blobdir file "*: `blobdir ` (output of **BlobToolKit ** {% icon tool %})
389
+ > 1. {% tool [Meryl ](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6 ) %} with the following parameters:
390
+ > - *"Operation type selector "*: `Generate histogram dataset `
391
+ > - {% icon param-file %} *"Input meryldb "*: `read_db ` (output of **Meryl ** {% icon tool %})
377
392
>
378
393
> ***TODO***: *Check parameter descriptions*
379
394
>
@@ -402,15 +417,12 @@ A big step can have several subsections or sub steps:
402
417
>
403
418
{: .question}
404
419
405
- ## Sub-step with **Merqury **
420
+ ## Sub-step with **GenomeScope **
406
421
407
422
> <hands-on-title> Task description </hands-on-title>
408
423
>
409
- > 1. {% tool [Merqury](toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3+galaxy2) %} with the following parameters:
410
- > - *"Evaluation mode"*: `Default mode`
411
- > - {% icon param-file %} *"K-mer counts database"*: `read_db` (output of **Meryl** {% icon tool %})
412
- > - *"Number of assemblies"*: `One assembly (pseudo-haplotype or mixed-haplotype)`
413
- > - {% icon param-file %} *"Genome assembly"*: `output` (Input dataset)
424
+ > 1. {% tool [GenomeScope](toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0+galaxy2) %} with the following parameters:
425
+ > - {% icon param-file %} *"Input histogram file"*: `read_db_hist` (output of **Meryl** {% icon tool %})
414
426
>
415
427
> ***TODO***: *Check parameter descriptions*
416
428
>
@@ -439,13 +451,15 @@ A big step can have several subsections or sub steps:
439
451
>
440
452
{: .question}
441
453
442
- ## Sub-step with **Meryl **
454
+ ## Sub-step with **Merqury **
443
455
444
456
> <hands-on-title> Task description </hands-on-title>
445
457
>
446
- > 1. {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} with the following parameters:
447
- > - *"Operation type selector"*: `Generate histogram dataset`
448
- > - {% icon param-file %} *"Input meryldb"*: `read_db` (output of **Meryl** {% icon tool %})
458
+ > 1. {% tool [Merqury](toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3+galaxy2) %} with the following parameters:
459
+ > - *"Evaluation mode"*: `Default mode`
460
+ > - {% icon param-file %} *"K-mer counts database"*: `read_db` (output of **Meryl** {% icon tool %})
461
+ > - *"Number of assemblies"*: `One assembly (pseudo-haplotype or mixed-haplotype)`
462
+ > - {% icon param-file %} *"Genome assembly"*: `output` (Input dataset)
449
463
>
450
464
> ***TODO***: *Check parameter descriptions*
451
465
>
@@ -474,12 +488,15 @@ A big step can have several subsections or sub steps:
474
488
>
475
489
{: .question}
476
490
477
- ## Sub-step with **GenomeScope **
491
+ ## Sub-step with **gfastats **
478
492
479
493
> <hands-on-title> Task description </hands-on-title>
480
494
>
481
- > 1. {% tool [GenomeScope](toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0+galaxy2) %} with the following parameters:
482
- > - {% icon param-file %} *"Input histogram file"*: `read_db_hist` (output of **Meryl** {% icon tool %})
495
+ > 1. {% tool [gfastats](toolshed.g2.bx.psu.edu/repos/bgruening/gfastats/gfastats/1.2.0+galaxy0) %} with the following parameters:
496
+ > - {% icon param-file %} *"Input file"*: `output` (Input dataset)
497
+ > - *"Specify target sequences"*: `Disabled`
498
+ > - *"Tool mode"*: `Summary statistics generation`
499
+ > - *"Report mode"*: `Genome assembly statistics (--nstar-report)`
483
500
>
484
501
> ***TODO***: *Check parameter descriptions*
485
502
>
0 commit comments