AlleleSeq2

Generate personal genomes, STAR indices and other helper files:

Makefile options (can be specified in `makePersonalGenome.mk` or as command-line arguments)

Dependencies, system parameters/paths:

VCF2DIPLOID_DIR: vcf2diploid (available from http://alleleseq.gersteinlab.org/tools.html)
PL: path to AlleleSeq2
LIFTOVER: UCSC liftOver tool
BEDTOOLS_intersectBed: Bedtools intersectBed
SAMTOOLS: Samtools
STAR: STAR aligner

Other options:

N_THREADS: number or threads (for STAR genomeGenerate)
REFGENOME_VERSION: reference genome version, 'GRCh37' or 'GRCh38'
REFGENOME: path to the reference genome .fasta file
FILE_PATH_VCF: path to VCF
VCF_SAMPLE_ID: sample name in VCF
FILE_PATH_BAM: path to WGS bam
OUTPUT_DIR: output folder

Example:

make -f makePersonalGenome.mk \
        N_THREADS=8 \
        VCF_SAMPLE_ID=sge_Aug_encodev2_2_local \
        REFGENOME_VERSION=GRCh38 \
        OUTPUT_DIR=pgenome_ENC-003 \
        FILE_PATH_BAM=ENCFF200XCY.bam \
        FILE_PATH_VCF=enc003.spliced.scrubbed.vcf

(1) Calling AS hetSNVs from a single sample

Makefile options (can be specified in `PIPELINE.mk` or as command-line arguments):

Dependencies:

python2

scipy
numpy
pandas

R

VGAM
ggplot2

Dependencies, system parameters/paths:

PL: path to AlleleSeq2
SAMTOOLS: samtools
PICARD: Broad picard tools
STAR: STAR aligner FASTQC: FastQC quality control tool
CUTADAPT: Cutadapt to remove adapter sequences (ATAC-seq samples)

Other options:

READS_R1: path to input .fastq file (R1)
READS_R2: path to input .fastq file (R2, if PE sequencing)
PGENOME_DIR: path to personal genome folder
VCF_SAMPLE_ID: sample name in VCF
ALIGNMENT_MODE: 'ASE' for RNA-seq, 'ASB' for ChIP-seq' and 'ASCA' for ATAC-seq
RM_DUPLICATE_READS: 'on' to remove duplicate reads with picard tools
STAR_readFilesCommand: --readFilesIn option in STAR
REFGENOME_VERSION: reference genome version, 'GRCh37' or 'GRCh38'
Annotation_diploid: path to diploid GENCODE gene annotation (for 'ASE')
FDR_CUTOFF: FDR threshold
Cntthresh_tot: threshold for the total number of reads mapped to hetSNV
Cntthresh_min: threshold for the minimal number of reads mapped to each allele

Example:

make -f PIPELINE.mk \
        PGENOME_DIR=pgenome_ENC-003 \
        REFGENOME_VERSION=GRCh38 \
        NTHR=8 \
        READS_R1=ENCFF337ZBN.fastq.gz \
        READS_R2=ENCFF481IQE.fastq.gz \
        STAR_readFilesCommand=zcat \
        ALIGNMENT_MODE=ASE \
        RM_DUPLICATE_READS=on \
        FDR_CUTOFF=0.10 \
	VCF_SAMPLE_ID=sge_Aug_encodev2_2_local

The main output file containing AS hetSNVs is
ENCFF337ZBN_ENCFF481IQE_interestingHets.FDR-0.10.betabinom.chrs1-22.6-tot_0-min_cnt.tsv

(2) Pool replicates/tissues

Makefile options (can be specified in `PIPELINE_aggregated_counts.mk` or as command-line arguments)

Dependencies, system parameters/paths:

PL: path to AlleleSeq2
PGENOME_DIR: path to personal genome folder
VCF_SAMPLE_ID: sample name in VCF

Other options:

INPUT_UNIQ_READS_PILEUP_FILES: list of .mpileup files with uniquely mapped reads to aggregate generated in (1)
INPUT_MMAP_READS_PILEUP_FILES: list of .mpileup files with multi-mapping reads to aggregate generated in (1)
PREFIX: prefix for output file names
Cntthresh_tot: threshold for the total number of reads mapped to hetSNV
Cntthresh_min: threshold for the minimal number of reads mapped to each allele
FDR_CUTOFF: FDR threshold

Example:

make -f PIPELINE_aggregated_counts.mk \
        PGENOME_DIR=pgenome_ENC-003 \
        INPUT_UNIQ_READS_PILEUP_FILES="../ENCSR238ZZD_ENCFF719MSG_1_ENCFF120MML_2_1_1/ENCFF719MSG_ENCFF120MML_hap1_uniqreads.mpileup    ../ENCSR238ZZD_ENCFF719MSG_1_ENCFF120MML_2_1_1/ENCFF719MSG_ENCFF120MML_hap2_uniqreads.mpileup ../ENCSR238ZZD_ENCFF337ZBN_1_ENCFF481IQE_2_1_1/ENCFF337ZBN_ENCFF481IQE_hap1_uniqreads.mpileup ../ENCSR238ZZD_ENCFF337ZBN_1_ENCFF481IQE_2_1_1/ENCFF337ZBN_ENCFF481IQE_hap2_uniqreads.mpileup" \
        INPUT_MMAP_READS_PILEUP_FILES="../ENCSR238ZZD_ENCFF719MSG_1_ENCFF120MML_2_1_1/ENCFF719MSG_ENCFF120MML_hap1_mmapreads.mpileup ../ENCSR238ZZD_ENCFF719MSG_1_ENCFF120MML_2_1_1/ENCFF719MSG_ENCFF120MML_hap2_mmapreads.mpileup ../ENCSR238ZZD_ENCFF337ZBN_1_ENCFF481IQE_2_1_1/ENCFF337ZBN_ENCFF481IQE_hap1_mmapreads.mpileup ../ENCSR238ZZD_ENCFF337ZBN_1_ENCFF481IQE_2_1_1/ENCFF337ZBN_ENCFF481IQE_hap2_mmapreads.mpileup" \
        PREFIX=ENCSR238ZZD \
        FDR_CUTOFF=0.10 \
        VCF_SAMPLE_ID=sge_Aug_encodev2_2_local

The main output file is ENCSR238ZZD_interestingHets.FDR-0.10.betabinom.chrs1-22.6-tot_0-min_cnt.tsv

(3) Aggregate across genomic elements

Makefile options (can be specified in `PIPELINE_aggregate_over_genomic_regions.mk` or as command-line arguments)

Dependencies, system parameters/paths:

PL: path to AlleleSeq2
BEDTOOLS_intersectBed: Bedtools intersectBed
SAMTOOLS: Samtools

Other options:

PREFIX: prefix for output file names
Cntthresh_tot: threshold for the total number of reads mapped to hetSNV
Cntthresh_min: threshold for the minimal number of reads mapped to each allele
FDR_CUTOFF: FDR threshold COUNTS_FILE: filtered read count file generated in (1) or (2). REGIONS_FILE: path to a .bed file with genomic elments (e.g. gene, cCRE) coordinates
UNIQ_ALN_FILES: .bam(s) with uniquely mapping reads generated in (1)
MMAP_ALN_FILES: .bam(s) with multimapping reads generated in (1)

Example:

make -f   PIPELINE_aggregate_over_genomic_regions.mk \
	  PREFIX=ENCSR238ZZD \
	  REGIONS_FILE="gencode.v24_all_genes.bed" \
	  COUNTS_FILE=../ENCSR238ZZD_combined/ENCSR238ZZD_filtered_counts.tsv \
	  UNIQ_ALN_FILES='../ENCSR238ZZD_ENCFF719MSG_1_ENCFF120MML_2_1_1/ENCFF719MSG_ENCFF120MML_ASE-params_crdsorted_uniqreads_over_hetSNVs.bam ../ENCSR238ZZD_ENCFF337ZBN_1_ENCFF481IQE_2_1_1/ENCFF337ZBN_ENCFF481IQE_ASE-params_crdsorted_uniqreads_over_hetSNVs.bam' \
	  MMAP_ALN_FILES='../ENCSR238ZZD_ENCFF719MSG_1_ENCFF120MML_2_1_1/ENCFF719MSG_ENCFF120MML_ASE-params_crdsorted_mmapreads_over_hetSNVs.bam ../ENCSR238ZZD_ENCFF337ZBN_1_ENCFF481IQE_2_1_1/ENCFF337ZBN_ENCFF481IQE_ASE-params_crdsorted_mmapreads_over_hetSNVs.bam' \
	  FDR_CUTOFF=0.10

The main output file listing AS genomic elements is ENCSR238ZZD_interesting_regions.FDR-0.10.betabinom.chrs1-22.6-tot_0-min_cnt.tsv

(4) Additional scripts to calculate hap-specific read coverage are provided under the directory calc_read_coverage

(5) Scripts to query the ENTEx catalog are provided under the directory query

The catalog file is hetSNVs_default_AS.tsv.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
calc_read_coverage		calc_read_coverage
docs		docs
query		query
.gitignore		.gitignore
FalsePos.py		FalsePos.py
FalsePos_regions.py		FalsePos_regions.py
PIPELINE.mk		PIPELINE.mk
PIPELINE_aggregate_over_genomic_regions.mk		PIPELINE_aggregate_over_genomic_regions.mk
PIPELINE_aggregated_counts.mk		PIPELINE_aggregated_counts.mk
README.md		README.md
STAR_custom_parameters_sample_file		STAR_custom_parameters_sample_file
alleledb_alleleseqBetabinomial.R		alleledb_alleleseqBetabinomial.R
alleledb_calcOverdispersion.R		alleledb_calcOverdispersion.R
binom.py		binom.py
calculate_rd.py		calculate_rd.py
feed_sorted_check_names.py		feed_sorted_check_names.py
filter_by_counts.py		filter_by_counts.py
filter_by_pval.py		filter_by_pval.py
filter_cnv_sites.py		filter_cnv_sites.py
filter_non-autosomal_chr.py		filter_non-autosomal_chr.py
filter_phase_warnings.py		filter_phase_warnings.py
filter_regions_by_counts.py		filter_regions_by_counts.py
filter_regions_by_pval.py		filter_regions_by_pval.py
filter_regions_non-autosomal_chr.py		filter_regions_non-autosomal_chr.py
filter_regions_w_mmaps.py		filter_regions_w_mmaps.py
filter_sites_w_mmaps.py		filter_sites_w_mmaps.py
get_hetSNV_bed.py		get_hetSNV_bed.py
intersect2counts.py		intersect2counts.py
makePersonalGenome.mk		makePersonalGenome.mk
pileup2counts.py		pileup2counts.py
plot_AllelicRatio_distribution.R		plot_AllelicRatio_distribution.R
plot_AllelicRatio_distribution.py		plot_AllelicRatio_distribution.py
read_pileup.py		read_pileup.py
refbed2hapcoords.py		refbed2hapcoords.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlleleSeq2

Generate personal genomes, STAR indices and other helper files:

Makefile options (can be specified in `makePersonalGenome.mk` or as command-line arguments)

Dependencies, system parameters/paths:

Other options:

Example:

(1) Calling AS hetSNVs from a single sample

Makefile options (can be specified in `PIPELINE.mk` or as command-line arguments):

Dependencies:

python2

R

Dependencies, system parameters/paths:

Other options:

Example:

(2) Pool replicates/tissues

Makefile options (can be specified in `PIPELINE_aggregated_counts.mk` or as command-line arguments)

Dependencies, system parameters/paths:

Other options:

Example:

(3) Aggregate across genomic elements

Makefile options (can be specified in `PIPELINE_aggregate_over_genomic_regions.mk` or as command-line arguments)

Dependencies, system parameters/paths:

Other options:

Example:

(4) Additional scripts to calculate hap-specific read coverage are provided under the directory calc_read_coverage

(5) Scripts to query the ENTEx catalog are provided under the directory query

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AlleleSeq2

Generate personal genomes, STAR indices and other helper files:

Makefile options (can be specified in makePersonalGenome.mk or as command-line arguments)

Dependencies, system parameters/paths:

Other options:

Example:

(1) Calling AS hetSNVs from a single sample

Makefile options (can be specified in PIPELINE.mk or as command-line arguments):

Dependencies:

python2

R

Dependencies, system parameters/paths:

Other options:

Example:

(2) Pool replicates/tissues

Makefile options (can be specified in PIPELINE_aggregated_counts.mk or as command-line arguments)

Dependencies, system parameters/paths:

Other options:

Example:

(3) Aggregate across genomic elements

Makefile options (can be specified in PIPELINE_aggregate_over_genomic_regions.mk or as command-line arguments)

Dependencies, system parameters/paths:

Other options:

Example:

(4) Additional scripts to calculate hap-specific read coverage are provided under the directory calc_read_coverage

(5) Scripts to query the ENTEx catalog are provided under the directory query

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Makefile options (can be specified in `makePersonalGenome.mk` or as command-line arguments)

Makefile options (can be specified in `PIPELINE.mk` or as command-line arguments):

Makefile options (can be specified in `PIPELINE_aggregated_counts.mk` or as command-line arguments)

Makefile options (can be specified in `PIPELINE_aggregate_over_genomic_regions.mk` or as command-line arguments)

Packages