You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: topics/introduction/tutorials/galaxy-intro-strands/tutorial.md
+14-13Lines changed: 14 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -159,7 +159,7 @@ The Table Browser has a daunting number of options. Fortunately, they are all se
159
159
**track** has a bewildering list of options. UCSC suggests `GENCODE v41`. A web search leads us to the [GENCODE web site](https://www.gencodegenes.org/) which prominently states:
160
160
161
161
> <warning-title>ALL GENCODE is different from GENCODE</warning-title>
162
-
> **Warning** The Table browser only provides the most recent release of GENCODE which is updated several times per year. ALL GENCODE does not contain the same data as GENCODE and you should select the GENCODE track even if the version number is wrong.
162
+
> **Warning** The Table browser only provides the most recent release of GENCODE which is updated several times per year. ALL GENCODE does not contain the same data as GENCODE and you should select the GENCODE track even if the version number is wrong.
163
163
{: .warning}
164
164
165
165
>The goal of the GENCODE project is to identify and classify all gene features in the human and mouse genomes with high accuracy based on biological evidence...
@@ -317,7 +317,7 @@ Here's how we'll answer this question:
317
317
318
318
It turns out that all of these steps are easy in Galaxy!
319
319
320
-
###Split the genes into forward and reverse datasets
320
+
## Split the genes into forward and reverse datasets
321
321
322
322
How might we do this? Column 6 contains the strand information. Can we split genes into two datasets based on the value of Column 6. How? Lets take a look at our available tools. And *whoa! There are over 40 toolboxes, and several hundred tools.* How are we going to find a tool that can do the split?
323
323
@@ -396,7 +396,7 @@ Your history should now have (at least) 3 datasets in it, with names like:
396
396
397
397
The number of genes in the `forward` plus `reverse` datasets should be the same as in the `Genes chr22` dataset. If they aren't can you figure out why?
398
398
399
-
###Check for overlaps
399
+
## Check for overlaps
400
400
401
401
Genes are an example of a *genomic interval*.
402
402
@@ -424,7 +424,7 @@ Of the tools in the **Operate on Genomic Intervals** toolbox, **Join** and parti
424
424
> - {% icon param-files %}*"of"*: `Genes, forward strand` (the first dataset)
425
425
> - {% icon param-files %} *"that intersect"* : `Genes, reverse strand` (the second dataset)
426
426
> -*"for at least"*: `1`
427
-
>
427
+
>
428
428
> This will return genes with even just one position overlapping.
429
429
> - *Click* **Run Tool**.
430
430
>
@@ -436,7 +436,7 @@ Of the tools in the **Operate on Genomic Intervals** toolbox, **Join** and parti
436
436
{: .hands_on}
437
437
438
438
439
-
##Results and final steps.
439
+
# Results and final steps.
440
440
441
441
At this point we *could* say that we have answered our question. Using dataset previews in the history panel, we can compare the number of genes in the `Overlapping forward` and `Overlapping reverse` datasets with the number of genes in the full `Genes chr22` dataset, and *conclude that overlapping genes on opposite strands are actually pretty common.*
442
442
@@ -445,7 +445,7 @@ However, before we rush off to publish our conclusions, let's
445
445
1. Get both the forward and reverse overlapping genes into a single dataset (one link will look better in our publication), and
446
446
2.*Visualize* our new dataset, just to double-check our results.
447
447
448
-
###Combine forward and reverse overlapping genes into one dataset.
448
+
## Combine forward and reverse overlapping genes into one dataset.
449
449
450
450
What tool can we use to combine the two datasets into one? Try *searching* for `combine` or `join` or `stack` in the tool search box. You'll find lots of tools, but none of them do what we want to do. *Some times you just have to manually look through toolboxes to find what you need.* Where should we look? It's probably not **Get Data** or **Send Data**, but it could easily be in any of the next 4 toolboxes: **Lift-Over, Collection Operations, Text Manipulation, or Datamash**.
451
451
@@ -459,18 +459,18 @@ It turns out that **Lift-Over** and **Collection Operations** are not what we wa
> - Click on {% icon param-repeat %} *"Insert Dataset"*
462
-
>
462
+
>
463
463
> This adds a second dataset pull-down menu to the form.
464
464
>
465
-
> - In *"1: Dataset"*
465
+
> - In *"1: Dataset"*
466
466
> - {% icon param-files %} *"Select"*: `Overlapping forward genes` as the second dataset.
467
467
> 4.*Click***Run Tool**
468
468
> 5.*Rename* the resulting dataset something informative like `Overlapping genes`
469
469
{: .hands_on}
470
470
471
471
Once the concatenate operation is finished, preview the dataset in your history panel. Does it have the expected number of genes in it? If not, see if you can figure out what happened.
472
472
473
-
###Visualize the overlapping genes
473
+
## Visualize the overlapping genes
474
474
475
475
Galaxy knows about several visualization options for lots of different dataset types, including BED. Whenever you preview a dataset in the history panel, Galaxy provides links to these visualizations. For BED files (which is the format we have), options include **IGB, IGV,** and **UCSC main.** IGB and IGV are widely used desktop applications and eventually you may want to install one or both of them. For now, let's visualize the data at UCSC, using the UCSC *Genome* Browser.
476
476
@@ -479,7 +479,8 @@ Galaxy knows about several visualization options for lots of different dataset t
479
479
> 1. Click on your `Overlapping genes` dataset in your history panel. This will show the dataset preview in the history panel.
480
480
> 2. Click to expand the dataset, if it isn't already, so that you can see the dataset metadata and additional actions like Visualize.
481
481
> 3. Click on the {% icon galaxy-barchart %} (**Visualize**) icon
482
-
> 3. Click on the **display at UCSC main** link.
482
+
> 4. Click on the **display at UCSC (main)** link that appears in the blue box at the top of the screen.
483
+
> 
483
484
>
484
485
> This will launch a new window, showing UCSC's Genome Browser with our dataset shown right at the top. UCSC figures out that our first overlapping gene is ~11 million bases into chromosome 22, and it has landed us there.
Run the [Create a reusable workflow from a history]({% link topics/galaxy-interface/tutorials/history-to-workflow/tutorial.md %}) tutorial for how to do this, *and then come back here to run your newly created workflow with the exon data.*
550
551
551
-
##Rerun analysis with exon data
552
+
# Rerun analysis with exon data
552
553
553
554
We want to run the same analysis, but this time only look for overlaps that happen in *exons*, the parts of genes that produce stuff our body uses. Before we start looking at exons, let's start a new history, one that contains only the genes file we got from UCSC. We could go back to UCSC and refetch the file, but there is an easier way.
554
555
@@ -567,7 +568,7 @@ We want to run the same analysis, but this time only look for overlaps that happ
567
568
> 6. The history name is a link. *Click* on it.
568
569
{: .hands_on}
569
570
570
-
###Get the exon data
571
+
## Get the exon data
571
572
572
573
And your new history appears in the history panel with the copied *genes* dataset. What we need is *exons.* How can we get the exon information? There are two relatively easy ways to get this information, one of which will be very familiar.
573
574
@@ -588,7 +589,7 @@ If you got the data from UCSC it will look something like this:
588
589
589
590
Your history should now have two datasets: one describing entire genes, and one describing just the exons.
590
591
591
-
###Rerun the analysis, this time on exons.
592
+
## Rerun the analysis, this time on exons.
592
593
593
594
When you did the *History to Workflow* tutorial you created a new workflow that was then added to your list of defined workflows.
0 commit comments