Autism Susceptibility Analysis Pipeline with a focus on Structural Variants (SVs). This repository documents the tasks involved in this project, which may be executed either sequentially or asynchronously. The approach used for rare variant or pathogenic candidate discovery in this study can be applied broadly to families affected by any rare disease.
Hardware requirements: Any Processor capable of running x86_64 architecture and at least 128GB of memory. Some steps can process samples in parallel, while the steps that handle all samples together scale logarithmically with sample size. Software requirements: The developed code mainly depends on the Python3 scientific stack and has been tested on the following system: Ubuntu 22.04.
- Sample
- QC
- Genome assembly
- Genome alignment
- Variant calling
- SV merging
- Annotation
- Methylation
- Housekeeping
- Citation
This study comprised 189 individuals (51 families) from the SSC, SAGE, and Rett-like cohorts, and the methodology is applicable to families with any rare disease.
| Count | Sex (proband-sibling) | Family type |
|---|---|---|
| 12 | F-F | quad |
| 16 | F-M | quad |
| 3 | M-F | quad |
| 5 | M-M | quad |
| 13 | F | trio |
| 2 | M | trio |
The sample manifest is available in the supplementary data of the publication.
back-reference-qc (Kraken2)
- Use this pipeline to check for non-human contamination in reads.
- Minimal requirement: FASTQ
- Use this tool/pipeline to assess inter-sample contamination.
- Minimal requirement: FASTQ
- Use this tool/pipeline to assess both non-human contamination and inter-sample contamination.
- Minimal requirement: BAM
- Use this tool/pipeline to assess inter-sample contamination, as well as ancestry and relatedness.
- Minimal requirement: BAM
- Use this tool/pipeline to assess genome assembly quality.
- Minimal requirement: FASTQ and its own Illumina
- Use this to verify sex per cell or sample.
- Minimal requirement: BAM
- click here for notes
This step produces FASTA files.
-
Use this pipeline/tool to assemble sample genomes. Trio-phased assembly requires parental Illumina data as input.
-
Version used for all our samples: hifiasm 0.16.1 with HiFi data only.
- Use this pipeline to correct partially phased sex chromosomes in autism family fathers.
- Use this pipeline to build contiguous sex chromosomes.
This step produces aligned BAM files.
- Use this pipeline to align HiFi FASTQ files to the reference genome.
- Use this tool to call SVs with assemblies. (instructions)
- Use this tool to call SVs with alignment (pbmm2 output).
- Use this tool to call SVs with alignment (pbmm2 output).
These steps are performed using Truvari in sequential order.
bcftools merge --thread {threads} --merge none --force-samples -O z -o {output.vcf.gz} {input.vcf1.gz} {input.vcf2.gz} {input.vcf3.gz}
truvari collapse -i {input.vcf.gz} -c {output.removed.vcf.gz} --sizemin 0 --sizemax 1000000 -k maxqual --gt het --intra --pctseq 0.90 --pctsize 0.90 --refdist 500 | bcftools sort --max-mem 8G -O z -o {output.collapsed.vcf.gz}bcftools merge --threads {threads} --merge none --force-samples --file-list {input.vcflist} -O z | bcftools norm --threads 15 --do-not-normalize --multiallelics -any --output-type z -o {output.mergevcf.gz}
truvari collapse --input {input.mergevcf.gz} --collapsed-output {output.removed_vcf.gz} --sizemin 0 --sizemax 1000000 --pctseq 0.90 --pctsize 0.90 --keep common --gt all | bcftools sort --max-mem {resources}G --output-type z > {output.collapsed_vcf.gz}python rareSVpool.py {input.collapsed_sv}- Initial caller support using Truvari
- Callable region evaluation using BoostSV
- Genotyping support using kanpig
- Rare TR expansions/contractions using TRGT
- Multiple sequence alignment (MSA) using MAFFT
- Read-based support validation using subseq or notes here
- Manual inspection using IGV
- Gene and location annotation using AnnotSV, and then simplified by using sim_annotSV.py
- CADD score using CADD-SV
- Regulatory annotation using REG data and comREG
- Combine all annotations from comREG
This step produces methylation bed files and corresponding bigwig files.
For citation, please refer to our paper at: https://www.medrxiv.org/content/10.1101/2025.07.21.25331932v1