Skip to content

simkin-bioinformatics/splicewiz

Repository files navigation

SpliceWiz Pipeline

A Snakemake pipeline for alternative splicing analysis using SpliceWiz. The pipeline aligns RNA-seq reads with STAR, processes BAM files with SpliceWiz, performs differential splicing analysis with edgeR, and generates coverage plots for significant splicing events.

Overview

FASTQs → STAR alignment → SpliceWiz BAM processing → collate data → differential analysis → coverage plots

An optional sub-pipeline (filter_fastqs/) can pre-filter FASTQs to reads mapping to a genomic region of interest before running the main pipeline.

Requirements

  • conda or mamba
  • ~16 GB RAM (STAR genome indexing)
  • Genome FASTA and GTF annotation files

Installation

1. Create the conda environment:

mamba env create -f environment.yml
conda activate splicewiz

2. Install R packages:

Open an R session inside the environment and run:

Rscript install_splicewiz.R

This installs SpliceWiz (from GitHub), DoubleExpSeq, DESeq2, limma, edgeR, GO.db, fgsea, and Rsubread.

Configuration

Edit config.yaml before running:

genome_fasta: /path/to/genome.fa.gz     # genome FASTA (gzipped or plain)
genome_gtf:   /path/to/annotation.gtf   # gene annotation GTF
output_folder: output                   # where all results are written
cores: 8                                # threads for STAR and SpliceWiz

fastq_folder: /path/to/fastqs           # directory containing paired FASTQ files
is_paired: True
fastq_suffix: .fq.gz                    # suffix after the sample name + _1/_2

samples:                                # list of sample base names
  - sample_A_rep1
  - sample_A_rep2
  - sample_B_rep1
  - sample_B_rep2

deltaPSI_cutoff: 0.10                   # minimum |deltaPSI| to report
search_term: gene_of_interest           # string to filter EventName in results

FASTQ files must follow the naming convention {sample}_1.fq.gz / {sample}_2.fq.gz.

Running the pipeline

# dry run
snakemake -s snakemake.smk -n

# run locally
snakemake -s snakemake.smk --cores 8

# run on a cluster (SLURM example)
snakemake -s snakemake.smk --executor slurm --jobs 20

Output

All output is written under output_folder/:

output_folder/
├── {genome}_reference/
│   ├── resource/           # copied genome (.2bit) and GTF
│   ├── SpliceWiz.ref.gz    # SpliceWiz splice reference
│   └── STAR/               # STAR genome index
└── {sample_folder}_{genome}_alignment/
    ├── aligned_bams/       # raw STAR BAMs
    ├── processed_bams/     # SpliceWiz-processed .cov files
    ├── nxtse/              # collated SpliceWiz experiment (filteredIntrons.Rds)
    ├── *_resEdgR.csv       # differential splicing results
    ├── *_coverage_*.pdf    # per-sample coverage plots
    └── *_coverage_*_group.pdf  # grouped (condition-level) coverage plots

Optional: filter FASTQs by genomic region

filter_fastqs/ contains a separate Snakemake pipeline that retains only reads overlapping a BED file of regions. This is useful for focusing on a locus of interest (e.g. a transposable element) before running the full pipeline.

Paths are hardcoded at the top of filter_fastqs/filter_fastqs.smk — edit them before running:

aligned_bams_folder = "/path/to/aligned_bams"
fastq_folder        = "/path/to/fastqs"
bed_file            = "/path/to/regions.bed"
snakemake -s filter_fastqs/filter_fastqs.smk --cores 8

The filtered FASTQs are written to {fastq_folder}_filtered/ and can then be used as input to the main pipeline.

Interactive exploration

To launch the SpliceWiz Shiny GUI in demo mode:

Rscript graphical_interface.R

Pipeline steps

Snakemake rule Script Description
getResources scripts/get_resources.R Copy genome and GTF into reference directory
SpliceRef scripts/buildSpliceReference.R Build SpliceWiz splice reference
StarRef scripts/buildStarReference.R Build STAR genome index
StarAlignment scripts/starAlignment.R Align paired-end FASTQs with STAR
ProcessBams scripts/processBams.R Process BAMs with SpliceWiz
CollateData scripts/collateData.R Collate processed BAMs into an SE object
CoveragePlots scripts/coveragePlots.R Differential analysis (edgeR) + coverage plots

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors