- Project Overview
- Pipeline Overview
- Repository Structure
- Installation
- Usage
- Features
- Outputs
- Coding Demonstration
- Authors
Given RNA-seq gene expression quantification files (TSV format), this project:
- Computes pairwise gene–gene Pearson correlations across samples
- Constructs a weighted, undirected gene co-expression network
- Detects gene modules using the Louvain community detection algorithm
- Computes network topology statistics (e.g., degree distribution and clustering behavior)
- Compares real networks to Fixed-m random graph models and, when two datasets are provided, performs cross-dataset comparisons
An R Shiny interface is included that allows users to run the Go analysis pipeline and interactively explore the resulting networks. Users can switch between datasets, view Louvain communities ranked by density, filter edges by correlation sign (positive/negative), customize colors, toggle gene labels, and export network visualizations.
Top-down pipeline structure:
- Input: RNA-seq gene expression TSV files
- Preprocessing: filtering and organization of expression data
- Correlation: computation of Pearson gene–gene correlations
- Graph Construction: weighted, undirected co-expression network
- Community Detection: Louvain modularity optimization
- Network Analysis: topology statistics and statistical comparisons
- Visualization: interactive exploration using R Shiny
-
main.go
Orchestrates the full analysis pipeline: loads datasets, builds networks, runs Louvain community detection, computes statistics, performs comparisons, and writes output CSV files. -
functions.go
Contains core helper functions for data parsing, preprocessing, correlation computation, graph construction, community analysis, and statistical testing. -
io.go
File input/output operations for reading TSV files and writing CSV outputs. -
datatypes.go
Defines data structures used throughout the pipeline. -
louvain/
Implementation of the Louvain community detection algorithm used by the pipeline. -
GeneExpressionData/
Directory for input datasets:GeneExpressionData/Dataset1/GeneExpressionData/Dataset2/
-
ShinyApp/
R Shiny interface (app.R) and CSV output files generated by the Go pipeline.
- Go 1.24+ (tested with Go 1.24.5)
- R 4.2+
install.packages(c(
"shiny",
"visNetwork",
"colourpicker",
"shinycssloaders",
"here"
))Place RNA-seq gene expression quantification TSV files into:
GeneExpressionData/Dataset1/(required)GeneExpressionData/Dataset2/(optional, for cross-dataset comparison)
Each TSV file should have:
- First column: Gene identifiers
- Subsequent columns: Sample names with TPM (Transcripts Per Million) unstranded count values
- Tab-separated format
- The pipeline specifically parses TPM unstranded count data from the expression files
Each dataset directory may contain one or more TSV files corresponding to samples from the same condition (e.g., cancer type).
-
Make sure you have Go 1.24 or higher installed and Go modules enabled.
-
Clone this repository:
git clone https://github.com/efranken-25/02-601_Project_Fall2025.git cd 02-601_Project_Fall2025 -
Install any Go dependencies (if not already in
go.mod):go get ./...
-
Build the Go executable:
go build -o 02-601_Project_Fall2025
-
Run the main Go program:
- For one dataset:
./02-601_Project_Fall2025 1
- For two datasets:
Note: Two datasets is the default behavior.
./02-601_Project_Fall2025 2 # or simply: ./02-601_Project_Fall2025
- For one dataset:
-
The Go program prints progress updates and summary statistics to the terminal and writes output CSV files to the
ShinyApp/directory.
Note: The Go executable must be built before launching the Shiny app, as the app calls the compiled Go program directly.
-
Open the R script in R or RStudio
-
Make sure you have the required packages installed:
install.packages(c("shiny", "visNetwork", "colourpicker", "shinycssloaders", "here"))
-
Set your working directory to the project folder (so that the Shiny app can access the CSVs):
setwd("/path/to/02-601_Project_Fall2025") -
Launch the Shiny app:
shiny::runApp("ShinyApp", launch.browser = TRUE)
Note: If your R working directory is not set to the project root, you may alternatively provide the full path:
shiny::runApp("/path/to/02-601_Project_Fall2025/ShinyApp", launch.browser = TRUE)
-
The app will open in your web browser.
Click "Run Full Analysis in Go" to execute the Go pipeline.
Progress updates and analysis summaries will print to the R console.Once the analysis completes, you can:
- Switch between datasets
- Select Louvain communities
- Filter edges by correlation sign
- Customize visualization settings
- Export network images
The Shiny app allows users to:
- Run the Go pipeline
- Switch between datasets
- Explore Louvain communities ranked by density
- Filter edges by correlation sign
- Customize visualization options
- Toggle labels
- Export network images
After running the Go analysis pipeline, the following outputs are generated:
The Go pipeline writes CSV files to the ShinyApp/ directory, including:
- Node tables: gene identifiers with assigned Louvain community labels
- Edge tables: weighted, undirected edges with Pearson correlation values
- Community statistics: per-module size, density, and related structural metrics
These CSV files are used by the R Shiny interface for visualization and interaction.
The Go pipeline also prints human-readable summaries during execution.
These summaries appear:
- in the terminal when Go is run directly, or
- in the R console when Go is executed through the Shiny app.
Printed summaries include:
- Graph properties: number of nodes and edges, mean degree, edge density, and proportions of positive vs. negative edges
- Module analysis: number of modules, module size statistics (min / median / mean ± SD / max), and largest module details
- Module structure: per-module tables reporting node counts, edge counts, and densities
- Network measures: Louvain modularity scores and global clustering coefficients (mean ± standard deviation)
- Statistical comparisons (KS tests):
- Real vs. Fixed-m random graph degree distributions (per dataset)
- Cross-dataset degree distribution comparisons (when two datasets are provided)
- Cross-dataset clustering coefficient distribution comparisons (when two datasets are provided)
- Corresponding D-statistics, p-values, and significance interpretations
A short recorded coding demonstration showing how to run the pipeline and explore results is available here:
https://drive.google.com/file/d/1mlOepfbSLY5wKjs8FrVG2FCqNKTcczVb/view?usp=sharing
- Beth Vazquez Smith - @bvazquezsmith
- Noemi Banda - @b-noemi
- Emma Franken - @efranken-25