Skip to content

Commit 49251d6

Browse files
committed
update index.md
1 parent f2c21f3 commit 49251d6

File tree

2 files changed

+44
-31
lines changed

2 files changed

+44
-31
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,21 @@
66

77
As shown above, this pipeline contains three stages:
88

9-
#### 1. Data preprocessing ####
9+
#### 1. Preprocessing ####
1010

1111
In this stage, the pipeline intakes raw inputs of variable formats (e.g., FASTQ, BAM/SAM or FASTA), and generates the **standard-in-format**, **clean-in-sequence** FASTQ files that can be directly used for quantification analysis.
1212

1313
#### 2. Quantification
1414

1515
In this stage, the pipeline generates the quantification meansurements at both gene- and transcript-levels. It supports three well-established and widely-used quantifiers:
1616

17-
- [**<u>Salmon</u>**](https://salmon.readthedocs.io/en/latest/salmon.html): an **alignment-free quantifier** with **wicked-fast speed** and **comarable accuracy**.
17+
- [**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): an **alignment-free quantifier** with **wicked-fast speed** and **comarable accuracy**.
1818

19-
- [**<u>RSEM</u>**](https://github.com/bli25/RSEM_tutorial): an **alignment-based quantifier** with **high accuracy**. It has been used as **gold standard** in many benchmarking studies.
19+
- [**RSEM**](https://github.com/bli25/RSEM_tutorial): an **alignment-based quantifier** with **high accuracy**. It has been used as **gold standard** in many benchmarking studies.
2020

21-
- [**<u>STAR</u>**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): an **alignment-based quantifier** featured by **spliced transcripts alignment. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
21+
- [**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): an **alignment-based quantifier** featured by **spliced transcripts alignment. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
2222

23-
#### 3. Summary
23+
#### 3. Summarization
2424

2525
In this stage, the pipeline generates **an HTML report** of quantification analyis for each sample, including alignment statistics, correlation analysis, gene body coverage visualizations, and more. When multiple samples are provided, it can also produce a universal report summarizing statistics of all samples, as well as a master gene expression matrix that can be directly used for **NetBID** analysis.
2626

index.md

Lines changed: 39 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,18 @@ nav_order: 1
55
permalink: /index
66
---
77

8+
### Scope of this pipeline
9+
10+
---
11+
812
Bulk RNA sequencing (RNA-Seq) is a highly sensitive and accurate tool for meansuring expression across the transcriptome. In addition to the transcriptome quantification, RNA-Seq also allows researchers to detect new splicing junctions (e.g. TOPHAP/TOPHAP2-regtools), novel transcripts (e.g. Cufflinks), gene fusion (e.g. STAR-Fusion, Arriba), single nucleotide variants (e.g. STAR-GATK), and other features. **<u>This pipeline is for transcriptome quantification purpose only</u>.**
913

14+
### The core question
15+
16+
---
17+
18+
**How can we ensure the accuracy of bulk RNA-seq quantification?** In this pipeline, we address this by **applying multiple quantification methods for cross-validation**, thereby increasing the reliability of the results.
19+
1020
The current bulk RNA-Seq quantification methods can be grouped into two categories, **alignment-based** and **alignment-free**, as summarized in the table below.
1121

1222
| | Alignment-based methods | Alignment-free methods |
@@ -17,39 +27,42 @@ The current bulk RNA-Seq quantification methods can be grouped into two categori
1727
| Accuracy | High | a little bit lower or equal |
1828
| Speed | Slow, a few hours for a typical run | Super-fast, a few minutes for a type run |
1929

20-
<u>**To ensure the accuracy of quantification, we employ one signature method from each of these two categories for cross-validation**:</u> **1)** **RSEM**, the most highly cited alignment-based method which shows the highest accuracy in most benchmarks; **2)** **Salmon**, one wicked-fast and highly-accurate alignment-free method which is recently further enhanced by integrating selective alignment and decoy sequences. We also introduce **3) STAR**, another alignment-based method recomended by GDC, as an optional method.
30+
In this pipeline, we provides three quantification methods covering both categories:
31+
32+
- [**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): one **wicked-fast** and **highly-accurate** alignment-free method which is recently further enhanced by integrating **selective alignment** and **decoy sequences**
33+
- [**RSEM**](https://github.com/bli25/RSEM_tutorial): the most highly cited alignment-based method which shows the highest accuracy in most benchmarks
34+
- [**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): another **alignment-based** quantifier featured by **spliced transcripts alignment. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline)**
35+
36+
### Overview
37+
38+
---
2139

22-
Below is an overview of the pipelines, which contains three sections: 1) **Preprocessing**: to prepare the standard inputs for quantification analysis; 2) **Quantification**: to **quantify** the gene and transcript expression in both alignment-based and alignment-free methods; 3) **Summarization**: to compile the **expression matrices** at both gene and transcript levels, and generate the **quanlity control report**.
40+
![Picture](docs/figures/overview.png)
2341

24-
![Picture1](/Users/qpan/Desktop/Picture1.png)
42+
#### Three stages
2543

26-
To serve better, we have:
44+
- **Preprocessing**: to prepare the standard inputs for quantification analysis
45+
- **Quantification**: to **quantify** the gene and transcript expression using both alignment-free and alignment-based methods
46+
- **Summarization**: to compile the **expression matrices** at both gene and transcript levels, and generate the **quanlity control report**.
2747

28-
* Complied all tools required into one single conda environment, which can be easily launched by:
48+
### Two species
2949

30-
```bash
31-
module load conda3/202210
32-
conda activate /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023
33-
```
50+
In this pipeline, we have pre-generated the index libraries for two species: **human** and **mouse**, as listed below. For other species, you will need to generate the index libraries by yourself following the Pipeline Setup tutorial.
3451

35-
* Updated all the index libraries to the latest version
52+
| Genome | GENCODE release | Release date | Path |
53+
| ------ | --------------- | ------------ | ------------------------------------------------------------ |
54+
| hg38 | v48 | 05.2025 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/hg38/gencode.release48 |
55+
| hg19 | v48lift37 | 05.2025 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/hg19/gencode.release48 |
56+
| mm39 | vM37 | 05.2025 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/mm39/gencode.releaseM37 |
57+
| mm10 | vM25 | 04.2020 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/mm10/gencode.releaseM25 |
3658

37-
| Genome | GENCODE | Path |
38-
| ------ | ------- | ------------------------------------------------------------ |
39-
| hg38 | v43 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/hg38/gencode.release43 |
40-
| hg19 | v43 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/hg19/gencode.release43 |
41-
| mm39 | vM32 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/mm39/gencode.releaseM32 |
42-
| mm10 | vM25 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/mm10/gencode.releaseM25 |
59+
### One environment
4360

44-
* Prepared the test cases of multi-format inputs
61+
We have managed to complie all tools required in this pipeline into one single conda environment. You can easily set it up following our tutorial.
4562

46-
| Cases | Library Type | File Format | Path |
47-
| ------- | ------------ | --------------------- | ------------------------------------------------------------ |
48-
| Sample1 | Paired-end | FASTQ | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample1 |
49-
| Sample2 | Single-end | FASTQ | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample2 |
50-
| Sample3 | Paired-end | FASTQ, multiple lanes | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample3 |
51-
| Sample4 | Single-end | FASTQ, multiple lanes | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample4 |
52-
| Sample5 | Paired-end | BAM | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample5 |
53-
| Sample6 | Single-end | BAM | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample6 |
63+
For St Jude HPC users, it can be easily launched by:
5464

55-
For traning purpose, we will go through the pipelines in a step-by-step way with real cases as listed above.
65+
```bash
66+
module load conda3/202402
67+
conda activate /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025
68+
```

0 commit comments

Comments
 (0)