Differentially Veloce Genes (DVG)

This repository contains code to identify and visualize Differentially Veloce Genes (DVGs)—genes exhibiting a statistically significant differential distribution of RNA velocity values when compared to a reference distribution. The pipeline implements statistical testing, multiple hypothesis correction, and interactive cluster visualization using Python.

Author & Date

Rachid Ounit, Ph.D.

01/17/2025

Introduction

RNA velocity analysis provides insights into cellular dynamics by estimating the future state of individual cells. Building on this concept, Differentially Veloce Genes (DVGs) are defined as genes that display statistically significant differences in their velocity distributions across experimental conditions or regions. This pipeline is tailored to:

Identify DVGs using differential testing (Welch’s t-test)
Adjust for multiple comparisons (using the Benjamini-Hochberg procedure)
Generate interactive clustermaps with dendrograms for in-depth exploration

Definition of DVG

A Differentially Veloce Gene (DVG) is a gene that meets the following criteria:

Exhibits a differential distribution of RNA velocity values between groups or conditions.
Achieves statistical significance as determined by a hypothesis test (Welch’s t-test).
Remains significant after correcting for multiple comparisons with an adjusted p-value threshold.

Code Overview

The code is organized into functions that:

Extract and process RNA velocity data.
Perform differential analysis per region.
Correct for multiple comparisons.
Generate interactive visualizations of the resulting DVGs.

Methods Description

`get_df_dv_per_region`

get_df_dv_per_region(
  name,
  ds,
  regions,
  adata,
  debug,
  p_lim
)

Purpose: Iterates over given regions and compiles DVG results from each region into a single DataFrame.
Parameters:
- name: Dataset identifier.
- ds: Index of the condition group.
- regions: List of region names to analyze.
- adata: Annotated data matrix containing RNA velocity layers.
- debug: Boolean flag to enable/disable debug output.
- p_lim: p-value threshold for significance.
Returns: Combined DataFrame with region-specific DVGs.

`get_diff_velo_genes`

get_diff_velo_genes(
  name,
  ds,
  region,
  adata,
  debug=False,
  p_lim=1e-9,
  plot=True,
  save_plot_path='difference_distribution.png'
)

Purpose: Identifies DVGs within a specific region using the Welch's t-test.
Details:
- Extracts the RNA velocity layer from adata.
- Filters data based on region and condition labels.
- Compares two pre-defined condition groups.
- Computes a test statistic and p-value for each gene.
- Applies a vector normalization (using exponential means) to quantify differences.
- Adjusts p-values using the Benjamini-Hochberg method.
- Optionally plots the distribution of the difference metric.
Returns: DataFrame of significant DVGs (genes passing the adjusted p-value cutoff).

`generate_interactive_clustermap_with_dendrogram`

generate_interactive_clustermap_with_dendrogram(
  ds,
  df,
  sub_folder_name,
  value_col='adjusted_p_value',
  gene_col='gene',
  region_col='region',
  output_dir='output',
  cmap='Viridis',
  width=1200,
  height=800
)

Purpose: Creates an interactive clustermap with dendrograms to visualize DVG significance across regions.
Procedure:
1. Pivots the input DataFrame to form a gene-by-region matrix.
2. Replaces very small/NaN values and applies a -log10 transformation to the data.
3. Generates row and column dendrograms using hierarchical clustering.
4. Reorders the matrix based on the dendrogram ordering.
5. Constructs a composite figure with interactive subplots (dendrograms and heatmap) using Plotly.
6. Saves the interactive plot as an HTML file.
Returns: No direct output; the function displays and saves the interactive figure.

`get_adata`

get_adata(
  idx
)

Purpose: Loads annotated data matrices from provided file paths.
Details:
- Accesses a list of file paths and dataset names.
- Reads the corresponding .h5ad file.
Returns: A tuple containing the dataset name and the AnnData object.

`run_analysis`

run_analysis(
  plim_lowest,
  plim_name
)

Purpose: Automates the analysis across multiple datasets.
Procedure:
- Iterates through pre-defined datasets.
- Extracts regions (excluding some that are not of interest).
- Compiles DVG results using get_df_dv_per_region.
- Generates interactive clustermaps with dendrograms.
Parameters:
- plim_lowest: The p-value threshold used for significance.
- plim_name: Subdirectory name for the output.
Execution: This function is invoked at the end of the script to run the complete pipeline.

Execution

To run the analysis, simply execute the script. Ensure that:

The AnnData .h5ad files are correctly referenced in the FILE_ADATA list.
The required packages (numpy, pandas, plotly, scipy, statsmodels, etc.) are installed.

python <script_name>.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Differentially Veloce Genes (DVG)

Author & Date

Introduction

Definition of DVG

Code Overview

Methods Description

`get_df_dv_per_region`

`get_diff_velo_genes`

`generate_interactive_clustermap_with_dendrogram`

`get_adata`

`run_analysis`

Execution

About

Uh oh!

Releases

Packages

Languages

License

rouni001/Differential-RNA-Velocity

Folders and files

Latest commit

History

Repository files navigation

Differentially Veloce Genes (DVG)

Author & Date

Introduction

Definition of DVG

Code Overview

Methods Description

get_df_dv_per_region

get_diff_velo_genes

generate_interactive_clustermap_with_dendrogram

get_adata

run_analysis

Execution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`get_df_dv_per_region`

`get_diff_velo_genes`

`generate_interactive_clustermap_with_dendrogram`

`get_adata`

`run_analysis`

Packages