Skip to content

x1han/AIDRS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AIDRS: AI-Aided Isoform Discovery for direct RNA-Seq

AIDRS (AI-Aided Isoform Discovery for direct RNA-Seq) is an advanced sequencing data–driven framework for full-length RNA isoform reconstruction and quantification from Oxford Nanopore Technology Direct RNA sequencing. Inspired by ISAtools, AIDRS extends the functionality with additional capabilities for protein coding potential prediction and translation site identification.

Designed with annotation flexibility and biological fidelity in mind, AIDRS supports isoform identification with high precision and recall, and accurately resolves splice junctions and transcript boundaries directly from read evidence.

If reference annotations are available, AIDRS incorporates conserved, low-abundance isoforms through guided filtering and rescue steps, further enhancing transcriptome completeness.

AIDRS incorporates TranslationAI for protein coding potential prediction and Puffin for enhanced TSS prediction.

🧬 AIDRS Enhanced Features

  • Protein Coding Potential Prediction: Using deep learning models to assess transcript coding ability and precise identification of start and stop codons
  • Enhanced TSS/TES Prediction: Improved transcription start site identification with Puffin model
  • PolyA Length Assessment: AIDRS requires BAM files with polyA length tags generated by aligning fastq files produced by dorado basecalling to evaluate transcript polyA enrichment

📦 Installation

AIDRS requires Python 3.11 and several dependencies including samtools, PyTorch, and TensorFlow.

We recommend using Conda to manage dependencies and environments.

✅ Create AIDRS environment (recommended)

# Clone AIDRS repository
git clone https://github.com/x1han/AIDRS.git
cd AIDRS
conda env create -f environment.yml
conda activate aidrs
pip install -e .

🔧 Quick Start

To run AIDRS with aligned reads, the minimum required parameters are: Example with toy data:

aidrs \
    -r test/toy_data/test_genome.fasta \
	-b test/toy_data/test_aligned.bam \
	-g test/toy_data/test_annotation.gtf \
	-o OUTPUT_FOLDER

🧪 Verify installation by running the toy example

Upon successful execution, output files will appear in the default aidrs_output folder.


📁 Output Files

Main Output

  • aidrs.transcript_model.gtf: Final filtered transcript models (including known and novel isoforms).
  • aidrs.transcript.assessment.tsv: Transcript model assessment statistics.

Optional Output (with --keep_temp)

  • *flnc.ssc: Read-level SSC file with alignment details.

About

AI-Aided Isoform Discovery for direct RNA-Seq

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors