update index.md

QingfeiPan · QingfeiPan · commit 49251d61447d · 2025-08-23T22:19:10.000-05:00
diff --git a/README.md b/README.md
@@ -6,21 +6,21 @@
 
 As shown above, this pipeline contains three stages:
 
-#### 1. Data preprocessing ####
+#### 1. Preprocessing ####
 
 In this stage, the pipeline intakes raw inputs of variable formats (e.g., FASTQ, BAM/SAM or FASTA), and generates the **standard-in-format**, **clean-in-sequence** FASTQ files that can be directly used for quantification analysis.
 
 #### 2. Quantification
 
 In this stage, the pipeline generates the quantification meansurements at both gene- and transcript-levels. It supports three well-established and widely-used quantifiers:
 
-- [**<u>Salmon</u>**](https://salmon.readthedocs.io/en/latest/salmon.html): an **alignment-free quantifier** with **wicked-fast speed** and **comarable accuracy**.
+- [**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): an **alignment-free quantifier** with **wicked-fast speed** and **comarable accuracy**.
 
-- [**<u>RSEM</u>**](https://github.com/bli25/RSEM_tutorial): an **alignment-based quantifier** with **high accuracy**. It has been used as **gold standard** in many benchmarking studies.
+- [**RSEM**](https://github.com/bli25/RSEM_tutorial): an **alignment-based quantifier** with **high accuracy**. It has been used as **gold standard** in many benchmarking studies.
 
-- [**<u>STAR</u>**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): an **alignment-based quantifier** featured by **spliced transcripts alignment. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
+- [**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): an **alignment-based quantifier** featured by **spliced transcripts alignment. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
 
-#### 3. Summary
+#### 3. Summarization
 
 In this stage, the pipeline generates **an HTML report** of quantification analyis for each sample, including alignment statistics, correlation analysis, gene body coverage visualizations, and more. When multiple samples are provided, it can also produce a universal report summarizing statistics of all samples, as well as  a master gene expression matrix that can be directly used for **NetBID** analysis.
 
diff --git a/index.md b/index.md
@@ -5,8 +5,18 @@ nav_order: 1
 permalink: /index
 ---
 
+### Scope of this pipeline
+
+---
+
 Bulk RNA sequencing (RNA-Seq) is a highly sensitive and accurate tool for meansuring expression across the transcriptome. In addition to the transcriptome quantification, RNA-Seq also allows researchers to detect new splicing junctions (e.g. TOPHAP/TOPHAP2-regtools), novel transcripts (e.g. Cufflinks), gene fusion (e.g. STAR-Fusion, Arriba), single nucleotide variants (e.g. STAR-GATK), and other features. **<u>This pipeline is for transcriptome quantification purpose only</u>.**
 
+### The core question
+
+---
+
+**How can we ensure the accuracy of bulk RNA-seq quantification?** In this pipeline, we address this by **applying multiple quantification methods for cross-validation**, thereby increasing the reliability of the results.
+
 The current bulk RNA-Seq quantification methods can be grouped into two categories, **alignment-based** and **alignment-free**, as summarized in the table below. 
 
 |            | Alignment-based methods                                      | Alignment-free methods                                       |
@@ -17,39 +27,42 @@ The current bulk RNA-Seq quantification methods can be grouped into two categori
 | Accuracy   | High                                                         | a little bit lower or equal                                  |
 | Speed      | Slow, a few hours for a typical run                          | Super-fast, a few minutes for a type run                     |
 
-<u>**To ensure the accuracy of quantification, we employ one signature method from each of these two categories for cross-validation**:</u> **1)** **RSEM**, the most highly cited alignment-based method which shows the highest accuracy in most benchmarks; **2)** **Salmon**, one wicked-fast and highly-accurate alignment-free method which is recently further enhanced by integrating selective alignment and decoy sequences. We also introduce **3) STAR**, another alignment-based method recomended by GDC, as an optional method.
+In this pipeline, we provides three quantification methods covering both categories:
+
+- [**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): one **wicked-fast** and **highly-accurate** alignment-free method which is recently further enhanced by integrating **selective alignment** and **decoy sequences**
+- [**RSEM**](https://github.com/bli25/RSEM_tutorial): the most highly cited alignment-based method which shows the highest accuracy in most benchmarks
+- [**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): another **alignment-based** quantifier featured by **spliced transcripts alignment. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline)**
+
+### Overview
+
+---
 
-Below is an overview of the pipelines, which contains three sections: 1) **Preprocessing**: to prepare the standard inputs for quantification analysis; 2) **Quantification**: to **quantify** the gene and transcript expression in both alignment-based and alignment-free methods; 3) **Summarization**: to compile the **expression matrices** at both gene and transcript levels, and generate the **quanlity control report**.
+![Picture](docs/figures/overview.png)
 
-![Picture1](/Users/qpan/Desktop/Picture1.png)
+#### Three stages
 
-To serve better, we have:
+- **Preprocessing**: to prepare the standard inputs for quantification analysis
+- **Quantification**: to **quantify** the gene and transcript expression using both alignment-free and alignment-based methods
+- **Summarization**: to compile the **expression matrices** at both gene and transcript levels, and generate the **quanlity control report**.
 
-* Complied all tools required into one single conda environment, which can be easily launched by:
+### Two species
 
-  ```bash
-  module load conda3/202210
-  conda activate /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023
-  ```
+In this pipeline, we have pre-generated the index libraries for two species: **human** and **mouse**, as listed below. For other species, you will need to generate the index libraries by yourself following the Pipeline Setup tutorial.
 
-* Updated all the index libraries to the latest version
+| Genome | GENCODE release | Release date | Path                                                         |
+| ------ | --------------- | ------------ | ------------------------------------------------------------ |
+| hg38   | v48             | 05.2025      | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/hg38/gencode.release48 |
+| hg19   | v48lift37       | 05.2025      | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/hg19/gencode.release48 |
+| mm39   | vM37            | 05.2025      | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/mm39/gencode.releaseM37 |
+| mm10   | vM25            | 04.2020      | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/mm10/gencode.releaseM25 |
 
-  | Genome | GENCODE | Path                                                         |
-  | ------ | ------- | ------------------------------------------------------------ |
-  | hg38   | v43     | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/hg38/gencode.release43 |
-  | hg19   | v43     | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/hg19/gencode.release43 |
-  | mm39   | vM32    | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/mm39/gencode.releaseM32 |
-  | mm10   | vM25    | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/yulab_databases/references/mm10/gencode.releaseM25 |
+### One environment 
 
-* Prepared the test cases of multi-format inputs
+We have managed to complie all tools required in this pipeline into one single conda environment. You can easily set it up following our tutorial.
 
-  | Cases   | Library Type | File Format           | Path                                                         |
-  | ------- | ------------ | --------------------- | ------------------------------------------------------------ |
-  | Sample1 | Paired-end   | FASTQ                 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample1 |
-  | Sample2 | Single-end   | FASTQ                 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample2 |
-  | Sample3 | Paired-end   | FASTQ, multiple lanes | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample3 |
-  | Sample4 | Single-end   | FASTQ, multiple lanes | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample4 |
-  | Sample5 | Paired-end   | BAM                   | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample5 |
-  | Sample6 | Single-end   | BAM                   | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2023/git_repo/testdata/sample6 |
+For St Jude HPC users, it can be easily launched by:
 
-For traning purpose, we will go through the pipelines in a step-by-step way with real cases as listed above. 
+```bash
+module load conda3/202402
+conda activate /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025
+```