AIDRS (AI-Aided Isoform Discovery for direct RNA-Seq) is an advanced sequencing data–driven framework for full-length RNA isoform reconstruction and quantification from Oxford Nanopore Technology Direct RNA sequencing. Inspired by ISAtools, AIDRS extends the functionality with additional capabilities for protein coding potential prediction and translation site identification.
Designed with annotation flexibility and biological fidelity in mind, AIDRS supports isoform identification with high precision and recall, and accurately resolves splice junctions and transcript boundaries directly from read evidence.
If reference annotations are available, AIDRS incorporates conserved, low-abundance isoforms through guided filtering and rescue steps, further enhancing transcriptome completeness.
AIDRS incorporates TranslationAI for protein coding potential prediction and Puffin for enhanced TSS prediction.
- Protein Coding Potential Prediction: Using deep learning models to assess transcript coding ability and precise identification of start and stop codons
- Enhanced TSS/TES Prediction: Improved transcription start site identification with Puffin model
- PolyA Length Assessment: AIDRS requires BAM files with polyA length tags generated by aligning fastq files produced by dorado basecalling to evaluate transcript polyA enrichment
AIDRS requires Python 3.11 and several dependencies including samtools, PyTorch, and TensorFlow.
We recommend using Conda to manage dependencies and environments.
# Clone AIDRS repository
git clone https://github.com/x1han/AIDRS.git
cd AIDRS
conda env create -f environment.yml
conda activate aidrs
pip install -e .To run AIDRS with aligned reads, the minimum required parameters are: Example with toy data:
aidrs \
-r test/toy_data/test_genome.fasta \
-b test/toy_data/test_aligned.bam \
-g test/toy_data/test_annotation.gtf \
-o OUTPUT_FOLDERUpon successful execution, output files will appear in the default aidrs_output folder.
aidrs.transcript_model.gtf: Final filtered transcript models (including known and novel isoforms).aidrs.transcript.assessment.tsv: Transcript model assessment statistics.
*flnc.ssc: Read-level SSC file with alignment details.