Skip to content

EichlerLab/asap

Repository files navigation

ASAP

Autism Susceptibility Analysis Pipeline with a focus on Structural Variants (SVs). This repository documents the tasks involved in this project, which may be executed either sequentially or asynchronously. The approach used for rare variant or pathogenic candidate discovery in this study can be applied broadly to families affected by any rare disease.

System Requirements

Hardware requirements: Any Processor capable of running x86_64 architecture and at least 128GB of memory. Some steps can process samples in parallel, while the steps that handle all samples together scale logarithmically with sample size. Software requirements: The developed code mainly depends on the Python3 scientific stack and has been tested on the following system: Ubuntu 22.04.

Table of Contents

Sample

Sample origin/cohort

This study comprised 189 individuals (51 families) from the SSC, SAGE, and Rett-like cohorts, and the methodology is applicable to families with any rare disease.

Count Sex (proband-sibling) Family type
12 F-F quad
16 F-M quad
3 M-F quad
5 M-M quad
13 F trio
2 M trio

The sample manifest is available in the supplementary data of the publication.

QC

back-reference-qc (Kraken2)
  • Use this pipeline to check for non-human contamination in reads.
    • Minimal requirement: FASTQ
  • Use this tool/pipeline to assess inter-sample contamination.
    • Minimal requirement: FASTQ
  • Use this tool/pipeline to assess both non-human contamination and inter-sample contamination.
    • Minimal requirement: BAM
  • Use this tool/pipeline to assess inter-sample contamination, as well as ancestry and relatedness.
    • Minimal requirement: BAM
  • Use this tool/pipeline to assess genome assembly quality.
    • Minimal requirement: FASTQ and its own Illumina

Genome assembly

This step produces FASTA files.

  • Use this pipeline/tool to assemble sample genomes. Trio-phased assembly requires parental Illumina data as input.

  • Version used for all our samples: hifiasm 0.16.1 with HiFi data only.

  • Use this pipeline to correct partially phased sex chromosomes in autism family fathers.
  • Use this pipeline to build contiguous sex chromosomes.

Genome alignment

This step produces aligned BAM files.

  • Use this pipeline to align HiFi FASTQ files to the reference genome.

Variant calling

  • Use this tool to call SVs with alignment (pbmm2 output).
  • Use this tool to call SVs with alignment (pbmm2 output).

SV merging

These steps are performed using Truvari in sequential order.

1. Intra-sample merge.

bcftools merge --thread {threads} --merge none --force-samples -O z -o {output.vcf.gz} {input.vcf1.gz} {input.vcf2.gz} {input.vcf3.gz}
truvari collapse -i {input.vcf.gz} -c {output.removed.vcf.gz} --sizemin 0 --sizemax 1000000 -k maxqual --gt het --intra --pctseq 0.90 --pctsize 0.90 --refdist 500 | bcftools sort --max-mem 8G -O z -o {output.collapsed.vcf.gz}

2. Inter-sample merge.

bcftools merge --threads {threads} --merge none --force-samples --file-list {input.vcflist} -O z | bcftools norm --threads 15 --do-not-normalize --multiallelics -any --output-type z -o {output.mergevcf.gz}
truvari collapse --input {input.mergevcf.gz} --collapsed-output {output.removed_vcf.gz} --sizemin 0 --sizemax 1000000 --pctseq 0.90 --pctsize 0.90 --keep common --gt all | bcftools sort --max-mem {resources}G --output-type z > {output.collapsed_vcf.gz}
python rareSVpool.py {input.collapsed_sv}

4. De novo validation

  • Initial caller support using Truvari
  • Callable region evaluation using BoostSV
  • Genotyping support using kanpig
  • Rare TR expansions/contractions using TRGT
  • Multiple sequence alignment (MSA) using MAFFT
  • Read-based support validation using subseq or notes here
  • Manual inspection using IGV

Annotation (GRCh38)

This step produces methylation bed files and corresponding bigwig files.

Citation

For citation, please refer to our paper at: https://www.medrxiv.org/content/10.1101/2025.07.21.25331932v1

About

Autism Susceptibility Analysis Pipeline with focus in Structural Variants

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •