Skip to content

Commit 5b091b3

Browse files
authored
Merge pull request #4152 from shiltemann/ucsc-ss
[Intro to Galaxy and Genomics] update ucsc display link screenshot
2 parents 472b518 + fce0a89 commit 5b091b3

File tree

1 file changed

+14
-13
lines changed
  • topics/introduction/tutorials/galaxy-intro-strands

1 file changed

+14
-13
lines changed

topics/introduction/tutorials/galaxy-intro-strands/tutorial.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ The Table Browser has a daunting number of options. Fortunately, they are all se
159159
**track** has a bewildering list of options. UCSC suggests `GENCODE v41`. A web search leads us to the [GENCODE web site](https://www.gencodegenes.org/) which prominently states:
160160

161161
> <warning-title>ALL GENCODE is different from GENCODE</warning-title>
162-
> **Warning** The Table browser only provides the most recent release of GENCODE which is updated several times per year. ALL GENCODE does not contain the same data as GENCODE and you should select the GENCODE track even if the version number is wrong.
162+
> **Warning** The Table browser only provides the most recent release of GENCODE which is updated several times per year. ALL GENCODE does not contain the same data as GENCODE and you should select the GENCODE track even if the version number is wrong.
163163
{: .warning}
164164

165165
>The goal of the GENCODE project is to identify and classify all gene features in the human and mouse genomes with high accuracy based on biological evidence...
@@ -317,7 +317,7 @@ Here's how we'll answer this question:
317317

318318
It turns out that all of these steps are easy in Galaxy!
319319

320-
### Split the genes into forward and reverse datasets
320+
## Split the genes into forward and reverse datasets
321321

322322
How might we do this? Column 6 contains the strand information. Can we split genes into two datasets based on the value of Column 6. How? Lets take a look at our available tools. And *whoa! There are over 40 toolboxes, and several hundred tools.* How are we going to find a tool that can do the split?
323323

@@ -396,7 +396,7 @@ Your history should now have (at least) 3 datasets in it, with names like:
396396

397397
The number of genes in the `forward` plus `reverse` datasets should be the same as in the `Genes chr22` dataset. If they aren't can you figure out why?
398398

399-
### Check for overlaps
399+
## Check for overlaps
400400

401401
Genes are an example of a *genomic interval*.
402402

@@ -424,7 +424,7 @@ Of the tools in the **Operate on Genomic Intervals** toolbox, **Join** and parti
424424
> - {% icon param-files %}*"of"*: `Genes, forward strand` (the first dataset)
425425
> - {% icon param-files %} *"that intersect"* : `Genes, reverse strand` (the second dataset)
426426
> - *"for at least"*: `1`
427-
>
427+
>
428428
> This will return genes with even just one position overlapping.
429429
> - *Click* **Run Tool**.
430430
>
@@ -436,7 +436,7 @@ Of the tools in the **Operate on Genomic Intervals** toolbox, **Join** and parti
436436
{: .hands_on}
437437

438438

439-
## Results and final steps.
439+
# Results and final steps.
440440

441441
At this point we *could* say that we have answered our question. Using dataset previews in the history panel, we can compare the number of genes in the `Overlapping forward` and `Overlapping reverse` datasets with the number of genes in the full `Genes chr22` dataset, and *conclude that overlapping genes on opposite strands are actually pretty common.*
442442

@@ -445,7 +445,7 @@ However, before we rush off to publish our conclusions, let's
445445
1. Get both the forward and reverse overlapping genes into a single dataset (one link will look better in our publication), and
446446
2. *Visualize* our new dataset, just to double-check our results.
447447

448-
### Combine forward and reverse overlapping genes into one dataset.
448+
## Combine forward and reverse overlapping genes into one dataset.
449449

450450
What tool can we use to combine the two datasets into one? Try *searching* for `combine` or `join` or `stack` in the tool search box. You'll find lots of tools, but none of them do what we want to do. *Some times you just have to manually look through toolboxes to find what you need.* Where should we look? It's probably not **Get Data** or **Send Data**, but it could easily be in any of the next 4 toolboxes: **Lift-Over, Collection Operations, Text Manipulation, or Datamash**.
451451

@@ -459,18 +459,18 @@ It turns out that **Lift-Over** and **Collection Operations** are not what we wa
459459
> - {% icon param-files %} *"Concatenate Dataset"*: `Overlapping reverse genes`.
460460
> - *"Dataset*"
461461
> - Click on {% icon param-repeat %} *"Insert Dataset"*
462-
>
462+
>
463463
> This adds a second dataset pull-down menu to the form.
464464
>
465-
> - In *"1: Dataset"*
465+
> - In *"1: Dataset"*
466466
> - {% icon param-files %} *"Select"*: `Overlapping forward genes` as the second dataset.
467467
> 4. *Click* **Run Tool**
468468
> 5. *Rename* the resulting dataset something informative like `Overlapping genes`
469469
{: .hands_on}
470470

471471
Once the concatenate operation is finished, preview the dataset in your history panel. Does it have the expected number of genes in it? If not, see if you can figure out what happened.
472472

473-
### Visualize the overlapping genes
473+
## Visualize the overlapping genes
474474

475475
Galaxy knows about several visualization options for lots of different dataset types, including BED. Whenever you preview a dataset in the history panel, Galaxy provides links to these visualizations. For BED files (which is the format we have), options include **IGB, IGV,** and **UCSC main.** IGB and IGV are widely used desktop applications and eventually you may want to install one or both of them. For now, let's visualize the data at UCSC, using the UCSC *Genome* Browser.
476476

@@ -479,7 +479,8 @@ Galaxy knows about several visualization options for lots of different dataset t
479479
> 1. Click on your `Overlapping genes` dataset in your history panel. This will show the dataset preview in the history panel.
480480
> 2. Click to expand the dataset, if it isn't already, so that you can see the dataset metadata and additional actions like Visualize.
481481
> 3. Click on the {% icon galaxy-barchart %} (**Visualize**) icon
482-
> 3. Click on the **display at UCSC main** link.
482+
> 4. Click on the **display at UCSC (main)** link that appears in the blue box at the top of the screen.
483+
> ![visualisation options are shown in Galaxy's middle panel]({% link topics/introduction/images/101_displayucsc.png %})
483484
>
484485
> This will launch a new window, showing UCSC's Genome Browser with our dataset shown right at the top. UCSC figures out that our first overlapping gene is ~11 million bases into chromosome 22, and it has landed us there.
485486
>
@@ -548,7 +549,7 @@ Let's refine our question slightly
548549

549550
Run the [Create a reusable workflow from a history]({% link topics/galaxy-interface/tutorials/history-to-workflow/tutorial.md %}) tutorial for how to do this, *and then come back here to run your newly created workflow with the exon data.*
550551

551-
## Rerun analysis with exon data
552+
# Rerun analysis with exon data
552553

553554
We want to run the same analysis, but this time only look for overlaps that happen in *exons*, the parts of genes that produce stuff our body uses. Before we start looking at exons, let's start a new history, one that contains only the genes file we got from UCSC. We could go back to UCSC and refetch the file, but there is an easier way.
554555

@@ -567,7 +568,7 @@ We want to run the same analysis, but this time only look for overlaps that happ
567568
> 6. The history name is a link. *Click* on it.
568569
{: .hands_on}
569570

570-
### Get the exon data
571+
## Get the exon data
571572

572573
And your new history appears in the history panel with the copied *genes* dataset. What we need is *exons.* How can we get the exon information? There are two relatively easy ways to get this information, one of which will be very familiar.
573574

@@ -588,7 +589,7 @@ If you got the data from UCSC it will look something like this:
588589

589590
Your history should now have two datasets: one describing entire genes, and one describing just the exons.
590591

591-
### Rerun the analysis, this time on exons.
592+
## Rerun the analysis, this time on exons.
592593

593594
When you did the *History to Workflow* tutorial you created a new workflow that was then added to your list of defined workflows.
594595

0 commit comments

Comments
 (0)