You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,21 +6,21 @@
6
6
7
7
As shown above, this pipeline contains three stages:
8
8
9
-
#### 1. Data preprocessing ####
9
+
#### 1. Preprocessing ####
10
10
11
11
In this stage, the pipeline intakes raw inputs of variable formats (e.g., FASTQ, BAM/SAM or FASTA), and generates the **standard-in-format**, **clean-in-sequence** FASTQ files that can be directly used for quantification analysis.
12
12
13
13
#### 2. Quantification
14
14
15
15
In this stage, the pipeline generates the quantification meansurements at both gene- and transcript-levels. It supports three well-established and widely-used quantifiers:
16
16
17
-
-[**<u>Salmon</u>**](https://salmon.readthedocs.io/en/latest/salmon.html): an **alignment-free quantifier** with **wicked-fast speed** and **comarable accuracy**.
17
+
-[**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): an **alignment-free quantifier** with **wicked-fast speed** and **comarable accuracy**.
18
18
19
-
-[**<u>RSEM</u>**](https://github.com/bli25/RSEM_tutorial): an **alignment-based quantifier** with **high accuracy**. It has been used as **gold standard** in many benchmarking studies.
19
+
-[**RSEM**](https://github.com/bli25/RSEM_tutorial): an **alignment-based quantifier** with **high accuracy**. It has been used as **gold standard** in many benchmarking studies.
20
20
21
-
-[**<u>STAR</u>**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): an **alignment-based quantifier** featured by **spliced transcripts alignment. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
21
+
-[**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): an **alignment-based quantifier** featured by **spliced transcripts alignment. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
22
22
23
-
#### 3. Summary
23
+
#### 3. Summarization
24
24
25
25
In this stage, the pipeline generates **an HTML report** of quantification analyis for each sample, including alignment statistics, correlation analysis, gene body coverage visualizations, and more. When multiple samples are provided, it can also produce a universal report summarizing statistics of all samples, as well as a master gene expression matrix that can be directly used for **NetBID** analysis.
Copy file name to clipboardExpand all lines: index.md
+39-26Lines changed: 39 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,8 +5,18 @@ nav_order: 1
5
5
permalink: /index
6
6
---
7
7
8
+
### Scope of this pipeline
9
+
10
+
---
11
+
8
12
Bulk RNA sequencing (RNA-Seq) is a highly sensitive and accurate tool for meansuring expression across the transcriptome. In addition to the transcriptome quantification, RNA-Seq also allows researchers to detect new splicing junctions (e.g. TOPHAP/TOPHAP2-regtools), novel transcripts (e.g. Cufflinks), gene fusion (e.g. STAR-Fusion, Arriba), single nucleotide variants (e.g. STAR-GATK), and other features. **<u>This pipeline is for transcriptome quantification purpose only</u>.**
9
13
14
+
### The core question
15
+
16
+
---
17
+
18
+
**How can we ensure the accuracy of bulk RNA-seq quantification?** In this pipeline, we address this by **applying multiple quantification methods for cross-validation**, thereby increasing the reliability of the results.
19
+
10
20
The current bulk RNA-Seq quantification methods can be grouped into two categories, **alignment-based** and **alignment-free**, as summarized in the table below.
@@ -17,39 +27,42 @@ The current bulk RNA-Seq quantification methods can be grouped into two categori
17
27
| Accuracy | High | a little bit lower or equal |
18
28
| Speed | Slow, a few hours for a typical run | Super-fast, a few minutes for a type run |
19
29
20
-
<u>**To ensure the accuracy of quantification, we employ one signature method from each of these two categories for cross-validation**:</u> **1)****RSEM**, the most highly cited alignment-based method which shows the highest accuracy in most benchmarks; **2)****Salmon**, one wicked-fast and highly-accurate alignment-free method which is recently further enhanced by integrating selective alignment and decoy sequences. We also introduce **3) STAR**, another alignment-based method recomended by GDC, as an optional method.
30
+
In this pipeline, we provides three quantification methods covering both categories:
31
+
32
+
-[**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): one **wicked-fast** and **highly-accurate** alignment-free method which is recently further enhanced by integrating **selective alignment** and **decoy sequences**
33
+
-[**RSEM**](https://github.com/bli25/RSEM_tutorial): the most highly cited alignment-based method which shows the highest accuracy in most benchmarks
34
+
-[**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): another **alignment-based** quantifier featured by **spliced transcripts alignment. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline)**
35
+
36
+
### Overview
37
+
38
+
---
21
39
22
-
Below is an overview of the pipelines, which contains three sections: 1) **Preprocessing**: to prepare the standard inputs for quantification analysis; 2) **Quantification**: to **quantify** the gene and transcript expression in both alignment-based and alignment-free methods; 3) **Summarization**: to compile the **expression matrices** at both gene and transcript levels, and generate the **quanlity control report**.
40
+

23
41
24
-

42
+
#### Three stages
25
43
26
-
To serve better, we have:
44
+
-**Preprocessing**: to prepare the standard inputs for quantification analysis
45
+
-**Quantification**: to **quantify** the gene and transcript expression using both alignment-free and alignment-based methods
46
+
-**Summarization**: to compile the **expression matrices** at both gene and transcript levels, and generate the **quanlity control report**.
27
47
28
-
* Complied all tools required into one single conda environment, which can be easily launched by:
In this pipeline, we have pre-generated the index libraries for two species: **human** and **mouse**, as listed below. For other species, you will need to generate the index libraries by yourself following the Pipeline Setup tutorial.
34
51
35
-
* Updated all the index libraries to the latest version
0 commit comments