diff --git a/tour/beerDEcoded.yaml b/tour/beerDEcoded.yaml new file mode 100644 index 00000000..ff688d7d --- /dev/null +++ b/tour/beerDEcoded.yaml @@ -0,0 +1,194 @@ +id: beerDEcoded +name: BeerDEcoded - StreetScienceCommunity +description: Workflow finds yeast strains contained in a sequenced beer sample. +title_default: BeerDEcoded +steps: + - title: [INFO] DNA in beer + content: >- + Beer contains DNA that comes from its ingredients and some hundred microbes. There are 1,000+ yeasts used for brewing and 200+ hop varieties, each one bearing a different DNA and contributing to differentiate its properties.
+ Thanks to sequencing it is now easier to find the different genomes contained in a beer sample and investigate their characteristics. The sequencing of the full genome of 157 brewing yeast strains was, for example, recently reported (Gallone B, Steensels J, Prahl T, et al. 2016).
+ backdrop: true + - title: [INFO] The effect of microbiom on beer + content: >- + Based on the identification of strains present in beer with desired characteristics, controlled experiments in which the microbial composition of the brew is altered could allow us to investigate if the presence of specific microorganisms affects flavour. The origin of each yeast species can be investigated; i.e. whether they come with the ingredients or from the environment at the production site. Furthermore, plant DNA, such as malt and hop varieties, can be found in beer samples, and the bacterial diversity can be mapped. + backdrop: true + - title: [INFO] How to study the microbiome of beer + content: >- + To study the microbiome of beer you need to find out what is inside the beer. Getting this inside can be found by extracting the DNAs of the living organisms (yeast) inside the beer. Now you would also like to 'read' this DNA. This can be achieved by sequencing the DNA.
Having these sequences now enables us to do a Data analysis, which we will do in the following. + backdrop: true + - title: [INFO] Workflow of BeerDEcoded process + content: >- + The beerDeCoded process contains 3 consistent steps. The first step is DNA extraction from beer. Then, this DNA can be sequenced. That means that we can obtain the sequence of nucleotides for this specific DNA. Finally, we have to analyze received data in order to know which organisms this DNA is from.
workflow of beerDeCoded process + - title: [ACTION] Create a new history + element: '#history-new-button' + intro: >- + Let's start by creating a new history.
Click on the button 'Create + new history'
+ position: left + preclick: + - '#center-panel' + - title: [ACTION] Rename the history + element: '#current-history-panel > div.controls' + intro: Change the name of the new history to 'BeerDEcoded'. + position: left + - title: [INFO] Description of FASTQ format + content: We need to upload data. We will start with a fastq file.
FASTQ format is a text-based format for storing both a biological sequence and its corresponding quality scores. Both the sequence letter and the quality score are encoded with a single ASCII character for brevity.
A FASTQ file normally uses four lines per sequence.

You can find more info in the Wikipedia article. + backdrop: true + - title: [ACTION] Open Galaxy Upload Manager + element: .upload-button + intro: >- + We will import fastq into the history we just created.

Click + 'Next' and the tour will open Galaxy Upload Manager and take you to + the Upload screen. + position: right + postclick: + - .upload-button + - title: [ACTION] Uploading the input data + intro: Upload the data by clicking the 'Paste/Fetch Data' button. + - title: [ACTION] Choose the variant of uploading the input data + content: >- + Load the data into your history by providing the links or choose your + fastq file from your computer.
Click on 'Start' to upload the + data into your Galaxy history. + - title: [ACTION] Close the window once the files are uploaded + content: >- + The upload may take awhile.

Hit the Close button when you + see that the files are uploaded into your history. + - title: [INFO] Uploading the input data Complete ! + content: 'Now that your data is ready, let''s analyze it!
' + backdrop: true + - title: [INFO] Key step in metagenomic data analysis + content: >- + One of the key steps in metagenomic data analysis is to identify the taxon to which the individual read belongs. Taxonomic classification tools are using microbial genome databases to identify the origin of each sequence.
+ backdrop: true + - title: [INFO] Process of taxonomic classification with Kraken2 + content: >- + To perform the taxonomic classification we will use Kraken2. This tool uses k-mers (the read’s subsequences of length k) to assign a taxonomic label to the sequence (if possible).
The taxonomic label is assigned based on matches of k-mer content of the considering sequence to the k-mer content of reference genome sequence. The result is a classification of the considering sequence to the most likely taxonomic label. If the k-mer content is not similar to any genomic sequence in the database used, it will not assign any taxonomic label.
+ Kraken2 simlified process image + backdrop: true + - title: [ACTION] Go to the tool menu + element: .toolMenuContainer + intro: Available tools appear here in the tool menu. + position: right + - title: [ACTION] Search for 'Kraken2' tool + element: '#__BVID__106' + content: >- + You can use 'tool search' to locate tools.

Search for + 'Kraken2' and select it.

Tools may take a couple of moments + to load. + placement: right + - title: [ACTION] Open 'Kraken2' tool + element: .toolMenuContainer + intro: >- + Open 'Kraken2' tool

Click 'Next' to continue our + tour. + position: right + - title: [INFO] Description of Kraken2 parameters + element: 'div[tour_id="use_names"]' + intro: >- + Have a look at the tool's parameters of the 'Kraken2' tool.

+ Additional info:
Parameter ‘Single or paired reads’
Single-end reads are the fragments sequenced from one side. With paired-end sequencing, the fragments are sequenced from both sides. This approach results in two reads per fragment, with the first read in forward orientation and the second read in reverse-complement orientation. With this technique, we have the advantage to get more information about each DNA fragment compared to reads sequenced by only single-end sequencing
Parameter ‘Confidence'
A confidence score of 0.0 means that non-restrictive taxonomic assignation is desired. This value can be increased if a more restrictive taxonomic assignation is desired. For example, a confidence score of 0.1 means that at least 10% of the k-mers should match entries in the database.
Parameter ‘Select a Kraken2 database’
We need to identify the taxon to which the individual reads belong. To identify the origin of each sequence, taxonomic classification tools use microbial genome databases. For this tutorial, we will use the fungi2019-03 database.

+ placement: right + - title: [ACTION] Choose parameters and execute Kraken2 + intro: >- + Please select the following parameters:

'Single or paired + reads': Single
'Input sequences': Uploaded dataset
+ 'Print scientific names instead of just taxids': Yes
+ 'Confidence': 0.0
'In “Create Report”':
+     'Print a report with aggregrate counts/clade to + file': Yes
    'Format report output like + Kraken 1’s kraken-mpa-report': Yes
'Select a Kraken2 + database': fungi2019-03'

Click 'Next' and the tour will + 'Execute' the Kraken2 tool for you.'
+ postclick: + - '#execute' + - title: [ACTION] View the output of Kraken2 + element: '#current-history-panel > div.controls' + intro: >- + Congratulations, you have created two files. It contains Classification + and Report file.

It will remain stored in your history.

+ Click the 'eye' icon to view the data once the history item turns + green.

+ position: left + - title: [INFO] Description of the Kraken2 output + element: '#current-history-panel > div.controls' + intro: >- + Additional info (from Kraken Manual):
+ In classification file each sequence (or sequence pair, in the case of paired reads) classified by Kraken 2 results in a single line of output. Kraken 2's output lines contain five tab-delimited fields; from left to right, they are: +
  1. "C"/"U": a one letter code indicating that the sequence was either classified or unclassified.
  2. The sequence ID, obtained from the FASTA/FASTQ header.
  3. The taxonomy ID Kraken 2 used to label the sequence; this is 0 if the sequence is unclassified.
  4. The length of the sequence in bp. In the case of paired read data, this will be a string containing the lengths of the two sequences in bp, separated by a pipe character, e.g. "98|94".
  5. A space-delimited list indicating the LCA mapping of each k-mer in the sequence(s). For example, "562:13 561:4 A:31 0:1 562:3" would indicate that:
    • the first 13 k-mers mapped to taxonomy ID #562
    • the next 4 k-mers mapped to taxonomy ID #561
    • the next 31 k-mers contained an ambiguous nucleotide
    • the next k-mer was not in the database
    • the last 3 k-mers mapped to taxonomy ID #562
    Note that paired read data will contain a "|:|" token in this list to indicate the end of one read and the beginning of another.When Kraken 2 is run against a protein database (see [Translated Search]), the LCA hitlist will contain the results of querying all six frames of each sequence. Reading frame data is separated by a "-:-" token.
+ Kraken 2's standard sample report file format is tab-delimited with one line per taxon. The fields of the output, from left-to-right, are as follows:
  1. Percentage of fragments covered by the clade rooted at this taxon
  2. Number of fragments covered by the clade rooted at this taxon
  3. Number of fragments assigned directly to this taxon
  4. A rank code, indicating (U)nclassified, (R)oot, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. Taxa that are not at any of these 10 ranks have a rank code that is formed by using the rank code of the closest ancestor rank with a number indicating the distance from that rank. E.g., "G2" is a rank code indicating a taxon is between genus and species and the grandparent taxon is at the genus rank.
  5. NCBI taxonomic ID number
  6. Indented scientific name
The scientific names are indented using space, according to the tree structure specified by the taxonomy.
+ position: left + - title: [INFO] The next step for visualization + content: >- + Once we have assigned the corresponding taxa to the sequence, the next + step is to properly visualize the data, for which we will use the Krona + pie chart tool (Ondov + et al. 2011).
+ backdrop: true + - title: [INFO] Adjust dataset format + content: >- + It can happen, that the output format of the tool needs to be changed in order for the next tool to read the data.Galaxy offers several "manipulation" tools.

The format of the file created after executing Kraken2 has a tabular format with one column and one line per taxon. Every line contains the information divided by | symbol. In order to make it more readable and usable by next tools we need a tab-delimited format with one line per taxon. The fields of the output, from left-to-right, should be as follows:1) Number of fragments assigned directly to this taxon; 2) A rank code, indicating (U)nclassified, (R)oot, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies.

Here we need to adjust the format of the data output from Kraken2.
+ backdrop: true + - title: [ACTION] Select 'Reverse' tool + element: '#__BVID__106' + intro: >- + Search and select the 'Reverse' tool. + position: right + - title: [ACTION] Choose parameters and execute 'Reverse' tool + intro: >- + You have selected the Reverse tool.
Ensure that for 'Input tabular + dataset' you have selected the report output file from + Kraken2.

'Execute' the Reverse tool when you are ready. + position: right + - title: [ACTION] Select 'Replace Text' tool + element: '#__BVID__106' + intro: Search and select the 'Replace Text' tool. + position: right + - title: [ACTION] Choose parameters and execute 'Replace Text' tool + intro: >- + You have selected the Replace Text tool.

Please select the + following parameters:

'File to process': Output dataset + 'out_file' from 'Reverse' tool
'In “Replacements”':
+     'Replacement 1':
+         'Find pattern': + \|
        'Replace + with:': \t
    'Replacement 2':
+         'Find pattern': + [a-z]__
        'Replace + with:': Empty

'Execute' the Reverse tool when you are + ready. + position: right + - title: [INFO] Visualization of the taxonomical classification with Krona + content: >- + Krona allows hierarchical data to be explored with zooming, + multi-layered pie charts. With this tool, we can easily visualize the + composition of the bacterial communities and compare how the populations + of microorganisms are modified according to the conditions of the + environment. + backdrop: true + - title: [ACTION] Select 'Krona pie chart' tool + element: '#__BVID__106' + intro: Search and select the 'Krona pie chart' tool. + position: right + - title: [ACTION] Choose parameters and execute 'Krona pie chart' tool + intro: >- + You have selected the 'Krona pie chart' tool.

Please select the + following parameters:

'What is the type of your input + data': Tabular
'Input file': Output dataset 'out_file' from + 'Replace Text' tool
'Provide a name for the basal rank': + Root
'Combine data from multiple datasets?': No


+ 'Execute' the 'Krona pie chart' tool when you are ready. + position: right + - title: [ACTION] View the 'Krona pie chart' output + element: '#current-history-panel > div.controls' + intro: >- + Let’s take a look at the result by clicking eye icon. Using the search bar + we can check if certain taxa are present. + position: left + preclick: + - '#center-panel' + - title: [INFO] The end of the tour BeerDEcoded + intro: >- + You have reached the end of the tour.

Thank you for going through + our tutorial. + backdrop: true diff --git a/tour/images/kraken2_simlified.jpg b/tour/images/kraken2_simlified.jpg new file mode 100644 index 00000000..31ad6467 Binary files /dev/null and b/tour/images/kraken2_simlified.jpg differ diff --git a/tour/images/readme b/tour/images/readme new file mode 100644 index 00000000..cc3a9807 --- /dev/null +++ b/tour/images/readme @@ -0,0 +1 @@ + This folder is created for images for BeerDeCoded Tour diff --git a/tour/images/workflow-of-beerDeCoded.png b/tour/images/workflow-of-beerDeCoded.png new file mode 100644 index 00000000..4f1eac6a Binary files /dev/null and b/tour/images/workflow-of-beerDeCoded.png differ