A Snakemake pipeline for alternative splicing analysis using SpliceWiz. The pipeline aligns RNA-seq reads with STAR, processes BAM files with SpliceWiz, performs differential splicing analysis with edgeR, and generates coverage plots for significant splicing events.
FASTQs → STAR alignment → SpliceWiz BAM processing → collate data → differential analysis → coverage plots
An optional sub-pipeline (filter_fastqs/) can pre-filter FASTQs to reads mapping to a genomic region of interest before running the main pipeline.
- conda or mamba
- ~16 GB RAM (STAR genome indexing)
- Genome FASTA and GTF annotation files
1. Create the conda environment:
mamba env create -f environment.yml
conda activate splicewiz2. Install R packages:
Open an R session inside the environment and run:
Rscript install_splicewiz.RThis installs SpliceWiz (from GitHub), DoubleExpSeq, DESeq2, limma, edgeR, GO.db, fgsea, and Rsubread.
Edit config.yaml before running:
genome_fasta: /path/to/genome.fa.gz # genome FASTA (gzipped or plain)
genome_gtf: /path/to/annotation.gtf # gene annotation GTF
output_folder: output # where all results are written
cores: 8 # threads for STAR and SpliceWiz
fastq_folder: /path/to/fastqs # directory containing paired FASTQ files
is_paired: True
fastq_suffix: .fq.gz # suffix after the sample name + _1/_2
samples: # list of sample base names
- sample_A_rep1
- sample_A_rep2
- sample_B_rep1
- sample_B_rep2
deltaPSI_cutoff: 0.10 # minimum |deltaPSI| to report
search_term: gene_of_interest # string to filter EventName in resultsFASTQ files must follow the naming convention {sample}_1.fq.gz / {sample}_2.fq.gz.
# dry run
snakemake -s snakemake.smk -n
# run locally
snakemake -s snakemake.smk --cores 8
# run on a cluster (SLURM example)
snakemake -s snakemake.smk --executor slurm --jobs 20All output is written under output_folder/:
output_folder/
├── {genome}_reference/
│ ├── resource/ # copied genome (.2bit) and GTF
│ ├── SpliceWiz.ref.gz # SpliceWiz splice reference
│ └── STAR/ # STAR genome index
└── {sample_folder}_{genome}_alignment/
├── aligned_bams/ # raw STAR BAMs
├── processed_bams/ # SpliceWiz-processed .cov files
├── nxtse/ # collated SpliceWiz experiment (filteredIntrons.Rds)
├── *_resEdgR.csv # differential splicing results
├── *_coverage_*.pdf # per-sample coverage plots
└── *_coverage_*_group.pdf # grouped (condition-level) coverage plots
filter_fastqs/ contains a separate Snakemake pipeline that retains only reads overlapping a BED file of regions. This is useful for focusing on a locus of interest (e.g. a transposable element) before running the full pipeline.
Paths are hardcoded at the top of filter_fastqs/filter_fastqs.smk — edit them before running:
aligned_bams_folder = "/path/to/aligned_bams"
fastq_folder = "/path/to/fastqs"
bed_file = "/path/to/regions.bed"snakemake -s filter_fastqs/filter_fastqs.smk --cores 8The filtered FASTQs are written to {fastq_folder}_filtered/ and can then be used as input to the main pipeline.
To launch the SpliceWiz Shiny GUI in demo mode:
Rscript graphical_interface.R| Snakemake rule | Script | Description |
|---|---|---|
getResources |
scripts/get_resources.R |
Copy genome and GTF into reference directory |
SpliceRef |
scripts/buildSpliceReference.R |
Build SpliceWiz splice reference |
StarRef |
scripts/buildStarReference.R |
Build STAR genome index |
StarAlignment |
scripts/starAlignment.R |
Align paired-end FASTQs with STAR |
ProcessBams |
scripts/processBams.R |
Process BAMs with SpliceWiz |
CollateData |
scripts/collateData.R |
Collate processed BAMs into an SE object |
CoveragePlots |
scripts/coveragePlots.R |
Differential analysis (edgeR) + coverage plots |