Skip to content

Commit 5e1e8c3

Browse files
authored
Merge pull request #48 from databio/dev
Dev
2 parents d895ca0 + 8f1cbac commit 5e1e8c3

31 files changed

+2168
-1288
lines changed

CHANGELOG.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,18 @@
11
# Change log
22
All notable changes to this project will be documented in this file.
33

4-
## [0.7.0] -- Unreleased
4+
## [0.7.0] -- 2018-06-25
5+
6+
### Added
7+
- Added containerization feature
8+
- Run with either [docker](https://www.docker.com/) or [singularity](https://singularity.lbl.gov/)
9+
- Added early bowtie2 index check
10+
11+
### Changed
12+
- Renamed pipeline
13+
- Improved summary figure reporting
14+
- Integrated summary results into pipeline interface
15+
516

617
## [0.6.1] -- 2017-12-15
718

Makefile

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
microtest:
2-
python $$CODEBASE/ATACseq/pipelines/ATACseq.py -I $$MICROTEST/data/atacR1.fq.gz -I2 $$MICROTEST/data/atacR2.fq.gz -G hg19 -O $$HOME/scratch -S atac_test --single-or-paired paired -R
2+
python $$CODEBASE/pepatac/pipelines/pepatac.py -I $$MICROTEST/data/atacR1.fq.gz -I2 $$MICROTEST/data/atacR2.fq.gz -G hg19 -O $$HOME/scratch -S atac_test --single-or-paired paired -R
33
test:
4-
python pipelines/ATACseq.py -P 3 -M 100 -O test_out -R -S liver -G hg19 -Q paired -C ATACseq.yaml --genome-size hs --prealignments rCRSd human_repeats -I examples/test_data/liver-CD31_test_R1.fastq.gz -I2 examples/test_data/liver-CD31_test_R2.fastq.gz
4+
python pipelines/pepatac.py -P 3 -M 100 -O test_out -R -S liver -G hg19 -Q paired -C pepatac.yaml --genome-size hs --prealignments rCRSd human_repeats -I examples/test_data/liver-CD31_test_R1.fastq.gz -I2 examples/test_data/liver-CD31_test_R2.fastq.gz
55
changtest:
6-
python pipelines/ATACseq.py -P 3 -M 100 -O test_out -R -S liver -G hg19 -Q paired -C $HOME/code/ATACseq/examples/chang_project/ATACseq.yaml -gs mm -I examples/test_data/liver-CD31_test_R1.fastq.gz -I2 examples/test_data/liver-CD31_test_R2.fastq.gz
6+
python pipelines/pepatac.py -P 3 -M 100 -O test_out -R -S liver -G hg19 -Q paired -C $HOME/code/pepatac/examples/chang_project/pepatac.yaml -gs mm -I examples/test_data/liver-CD31_test_R1.fastq.gz -I2 examples/test_data/liver-CD31_test_R2.fastq.gz
7+
8+
9+
docker:
10+
docker build -t databio/pepatac -f containers/pepatac.Dockerfile .
11+
12+
singularity:
13+
singularity build $${SIMAGES}pepatac docker://databio/pepatac

README.md

Lines changed: 217 additions & 61 deletions
Large diffs are not rendered by default.

config/pipeline_interface.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
ATACseq.py:
2-
name: ATACseq
1+
pepatac.py:
2+
name: PEPATAC
33
looper_args: True
44
required_input_files: [read1, read2]
55
all_input_files: [read1, read2]

config/protocol_mappings.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
ATAC: ATACseq.py
2-
ATAC-SEQ: ATACseq.py
1+
ATAC: pepatac.py
2+
ATAC-SEQ: pepatac.py

containers/pepatac.Dockerfile

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# Pull base image
2+
FROM phusion/baseimage:0.10.1
3+
4+
# Who maintains this image
5+
LABEL maintainer Jason Smith "[email protected]"
6+
7+
# Version info
8+
LABEL version 0.8.1
9+
10+
# Use baseimage-docker's init system.
11+
CMD ["/sbin/my_init"]
12+
13+
# Install dependencies
14+
RUN apt-get update && \
15+
DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes \
16+
curl \
17+
default-jre \
18+
default-jdk \
19+
git \
20+
libcommons-math3-java \
21+
libcurl4-gnutls-dev \
22+
libjbzip2-java \
23+
libpng-dev \
24+
libssl-dev \
25+
libtbb2 \
26+
libtbb-dev \
27+
openssl \
28+
pigz \
29+
python \
30+
python-pip python-dev build-essential \
31+
wget
32+
33+
# Install MySQL server
34+
RUN DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes mysql-server \
35+
mysql-client \
36+
libmysqlclient-dev
37+
38+
# Install python tools
39+
RUN pip install --upgrade pip
40+
RUN pip install virtualenv && \
41+
pip install numpy && \
42+
pip install MACS2 && \
43+
pip install pararead && \
44+
pip install piper
45+
46+
# Install R
47+
RUN DEBIAN_FRONTEND=noninteractive apt-get --assume-yes install r-base r-base-dev && \
48+
echo "r <- getOption('repos'); r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile && \
49+
Rscript -e "install.packages('devtools')" && \
50+
Rscript -e "devtools::install_github('pepkit/pepr')" && \
51+
Rscript -e "install.packages('gtable')" && \
52+
Rscript -e "install.packages('argparser')" && \
53+
Rscript -e "install.packages('ggplot2')" && \
54+
Rscript -e "install.packages('gplots')" && \
55+
Rscript -e "install.packages('grid')" && \
56+
Rscript -e "install.packages('scales')" && \
57+
Rscript -e "install.packages('data.table')" && \
58+
Rscript -e "install.packages('stringr')"
59+
60+
61+
# Install bedtools
62+
RUN DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes \
63+
ant \
64+
bedtools
65+
66+
# Install fastqc
67+
WORKDIR /home/tools/
68+
RUN wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.7.zip && \
69+
unzip fastqc_v0.11.7.zip && \
70+
cd /home/tools/FastQC && \
71+
chmod 755 fastqc && \
72+
ln -s /home/tools/FastQC/fastqc /usr/bin/
73+
74+
# Install htslib
75+
WORKDIR /home/src/
76+
RUN wget https://github.com/samtools/htslib/releases/download/1.7/htslib-1.7.tar.bz2 && \
77+
tar xf htslib-1.7.tar.bz2 && \
78+
cd /home/src/htslib-1.7 && \
79+
./configure --prefix /home/tools/ && \
80+
make && \
81+
make install
82+
83+
# Install samtools
84+
WORKDIR /home/src/
85+
RUN wget https://github.com/samtools/samtools/releases/download/1.7/samtools-1.7.tar.bz2 && \
86+
tar xf samtools-1.7.tar.bz2 && \
87+
cd /home/src/samtools-1.7 && \
88+
./configure && \
89+
make && \
90+
make install && \
91+
ln -s /home/src/samtools-1.7/samtools /usr/bin/
92+
93+
# Install bowtie2
94+
WORKDIR /home/src/
95+
RUN wget https://downloads.sourceforge.net/project/bowtie-bio/bowtie2/2.3.4.1/bowtie2-2.3.4.1-source.zip && \
96+
unzip bowtie2-2.3.4.1-source.zip && \
97+
cd /home/src/bowtie2-2.3.4.1 && \
98+
make && \
99+
make install && \
100+
ln -s /home/src/bowtie2-2.3.4.1/bowtie2 /usr/bin/
101+
102+
# Install picard
103+
WORKDIR /home/tools/bin
104+
RUN wget https://github.com/broadinstitute/picard/releases/download/2.18.0/picard.jar && \
105+
chmod +x picard.jar
106+
107+
# Install UCSC tools
108+
WORKDIR /home/tools/
109+
RUN wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphToBigWig && \
110+
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/wigToBigWig && \
111+
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigWigCat && \
112+
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bedSort && \
113+
chmod +x /home/tools/bedGraphToBigWig && \
114+
chmod +x /home/tools/wigToBigWig && \
115+
chmod +x /home/tools/bigWigCat && \
116+
chmod +x /home/tools/bedSort && \
117+
ln -s /home/tools/bedGraphToBigWig /usr/bin/ && \
118+
ln -s /home/tools/wigToBigWig /usr/bin/ && \
119+
ln -s /home/tools/bigWigCat /usr/bin/ && \
120+
ln -s /home/tools/bedSort /usr/bin/
121+
122+
# Install Skewer
123+
WORKDIR /home/src/
124+
RUN git clone git://github.com/relipmoc/skewer.git && \
125+
cd /home/src/skewer && \
126+
make && \
127+
make install
128+
129+
# OPTIONAL REQUIREMENTS
130+
# Install F-seq
131+
WORKDIR /home/src/
132+
RUN wget https://github.com/aboyle/F-seq/archive/master.zip && \
133+
unzip master.zip && \
134+
cd /home/src/F-seq-master && \
135+
ant && \
136+
cd dist~/ && \
137+
tar xf fseq.tgz && \
138+
ln -s /home/src/F-seq-master/dist~/fseq/bin/fseq /usr/bin/
139+
140+
# Install Trimmomatic
141+
WORKDIR /home/src/
142+
RUN wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.36.zip && \
143+
unzip Trimmomatic-0.36.zip && \
144+
chmod +x Trimmomatic-0.36/trimmomatic-0.36.jar
145+
146+
# Set environment variables
147+
ENV PATH=/home/tools/bin:/home/tools/:/home/tools/bin/kentUtils/:/home/src/F-seq-master/dist~/fseq/bin:/home/src/bowtie2-2.3.4.1:/home/src/skewer:/home/src/samtools-1.7:/home/src/Trimmomatic-0.36/:/home/src/htslib-1.7:$PATH \
148+
TRIMMOMATIC=/home/src/Trimmomatic-0.36/trimmomatic-0.36.jar \
149+
PICARD=/home/tools/bin/picard.jar \
150+
R_LIBS_USER=/usr/local/lib/R/site-library/
151+
152+
# Define default command
153+
WORKDIR /home/
154+
CMD ["/bin/bash"]
155+
156+
# Clean up APT when done.
157+
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

example_cmd.txt

Lines changed: 16 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,90 +1,33 @@
1-
# Example commands of using pepATAC through pypiper.
2-
# For the example commands of using pepATAC with looper, please see the xxx Users Guide.
1+
# Example commands of using PEPATAC through pypiper.
2+
# For the example commands of using PEPATAC with looper, please see the xxx Users Guide.
33

44
INPUT=/path/to/sequencing_results/fastq_files
55

6-
# run pepATAC on a human paired-end reads dataset using 5 threads:
7-
python pipelines/ATACseq.py -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C ATACseq.yaml -gs hs -I $INPUT/ATACseq_results_PE_R1.fastq.gz -I2 $INPUT/ATACseq_results_PE_R2.fastq.gz
6+
# run PEPATAC on a human paired-end reads dataset using 5 threads:
7+
python pipelines/pepatac.py -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C pepatac.yaml -gs hs -I $INPUT/pepatac_results_PE_R1.fastq.gz -I2 $INPUT/pepatac_results_PE_R2.fastq.gz
88

9-
# run pepATAC on multiple datasets at the same time: <- this could be wrong as I don't see an explaination of how to use -I and -I2 with multiple samples
10-
python pipelines/ATACseq.py -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C ATACseq.yaml -gs hs -I $INPUT/ATACseq_results1_PE_R1.fastq.gz $INPUT/ATACseq_results2_PE_R1.fastq.gz $INPUT/ATACseq_results3_PE_R1.fastq.gz -I2 $INPUT/ATACseq_results1_PE_R2.fastq.gz $INPUT/ATACseq_results2_PE_R2.fastq.gz $INPUT/ATACseq_results3_PE_R2.fastq.gz
9+
# run PEPATAC on multiple datasets at the same time: <- this could be wrong as I don't see an explaination of how to use -I and -I2 with multiple samples
10+
python pipelines/pepatac.py -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C pepatac.yaml -gs hs -I $INPUT/pepatac_results1_PE_R1.fastq.gz $INPUT/pepatac_results2_PE_R1.fastq.gz $INPUT/pepatac_results3_PE_R1.fastq.gz -I2 $INPUT/pepatac_results1_PE_R2.fastq.gz $INPUT/pepatac_results2_PE_R2.fastq.gz $INPUT/pepatac_results3_PE_R2.fastq.gz
1111

1212
# run multiple samples with a for loop:
1313
declare -a sample_name_arr=("sample1","sample2","sample3")
1414
for sample_name in "${sample_name_arr[@]}"
1515
do
1616
file1=$INPUT/{$file1}_PE_R1.fastq.gz
1717
file2=${file1/R1/R2}
18-
python pipelines/ATACseq.py -P 5 -O output_folder -S $sample_name -G hg38 -Q paired -C ATACseq.yaml -gs hs -I $file1 -I2 $file2
18+
python pipelines/pepatac.py -P 5 -O output_folder -S $sample_name -G hg38 -Q paired -C pepatac.yaml -gs hs -I $file1 -I2 $file2
1919
done
2020

21-
# run pepATAC on a mouse single-end reads dataset using 8 threads:
22-
python pipelines/ATACseq.py -P 8 -O output_folder -S output_sample_name -G mm10 -Q single -C ATACseq.yaml -gs mm -I $INPUT/ATACseq_results_PE_R1.fastq.gz
21+
# run PEPATAC on a mouse single-end reads dataset using 8 threads:
22+
python pipelines/pepatac.py -P 8 -O output_folder -S output_sample_name -G mm10 -Q single -C pepatac.yaml -gs mm -I $INPUT/pepatac_results_PE_R1.fastq.gz
2323

24-
# run pepATAC with different trimming tools then default trimmomatic, currectly supports skewer and pyadapt:
25-
python pipelines/ATACseq.py --skewer TRUE -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C ATACseq.yaml -gs hs -I $INPUT/ATACseq_results_PE_R1.fastq.gz -I2 $INPUT/ATACseq_results_PE_R2.fastq.gz
26-
python pipelines/ATACseq.py --pyadapt TRUE -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C ATACseq.yaml -gs hs -I $INPUT/ATACseq_results_PE_R1.fastq.gz -I2 $INPUT/ATACseq_results_PE_R2.fastq.gz
27-
28-
# re-run pepATAC and over-write the previous output:
29-
python pipelines/ATACseq.py -N -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C ATACseq.yaml -gs hs -I $INPUT/ATACseq_results_PE_R1.fastq.gz -I2 $INPUT/ATACseq_results_PE_R2.fastq.gz
30-
31-
# continue to run pepATAC since a locked step (usually locked due to failure):
32-
python pipelines/ATACseq.py -R -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C ATACseq.yaml -gs hs -I $INPUT/ATACseq_results_PE_R1.fastq.gz -I2 $INPUT/ATACseq_results_PE_R2.fastq.gz
33-
34-
35-
36-
37-
# check xxxx for full list of parameter usage
38-
39-
# full list of parameters are listed below:
40-
python ATACseq.py
41-
usage: ATACseq.py [-h] [-N] [-I2 INPUT_FILES2 [INPUT_FILES2 ...]]
42-
[-M MEMORY_LIMIT] [-Q SINGLE_OR_PAIRED] [-S SAMPLE_NAME]
43-
[-P NUMBER_OF_CORES] [-D] [-I INPUT_FILES [INPUT_FILES ...]]
44-
[-F] [-R] [-C CONFIG_FILE] [-O PARENT_OUTPUT_FOLDER]
45-
[-G GENOME_ASSEMBLY] [-gs GENOME_SIZE]
46-
[--frip-ref-peaks FRIP_REF_PEAKS] [--pyadapt] [--skewer]
47-
[--prealignments PREALIGNMENTS [PREALIGNMENTS ...]] [-V]
48-
49-
Pipeline
50-
optional arguments:
51-
-C CONFIG_FILE, --config CONFIG_FILE
52-
pipeline config file in YAML format; relative paths
53-
are considered relative to the pipeline script.
54-
defaults to ATACseq.yaml
55-
-D, --dirty Make all cleanups manual
56-
-F, --follow Run all follow commands, even if command is not run
57-
--frip-ref-peaks FRIP_REF_PEAKS
58-
Reference peak set for calculating FRIP
59-
-G GENOME_ASSEMBLY, --genome GENOME_ASSEMBLY
60-
identifier for genome assempbly (required)
61-
-gs GENOME_SIZE, --genome-size GENOME_SIZE
62-
genome size for MACS2
63-
-h, --help show this help message and exit
64-
-I INPUT_FILES [INPUT_FILES ...], --input INPUT_FILES [INPUT_FILES ...]
65-
One or more primary input files (required)
66-
-I2 INPUT_FILES2 [INPUT_FILES2 ...], --input2 INPUT_FILES2 [INPUT_FILES2 ...]
67-
One or more secondary input files (if they exists);
68-
for example, second read in pair.
69-
-M MEMORY_LIMIT, --mem MEMORY_LIMIT
70-
Memory string for processes that accept memory limits
71-
(like java)
72-
-N, --new-start Fresh start mode, overwrite all
73-
-O PARENT_OUTPUT_FOLDER, --output-parent PARENT_OUTPUT_FOLDER
74-
parent output directory of the project (required).
75-
-P NUMBER_OF_CORES, --cores NUMBER_OF_CORES
76-
number of cores to use for parallel processes
77-
-Q SINGLE_OR_PAIRED, --single-or-paired SINGLE_OR_PAIRED
78-
single or paired end? default: single
79-
-R, --recover Recover mode, overwrite locks
80-
-S SAMPLE_NAME, --sample-name SAMPLE_NAME
81-
unique name for output subfolder and files (required)
82-
--pyadapt Use pyadapter_trim for trimming? [Default: False]
83-
--skewer Use skewer for trimming? [Default: False]
84-
--prealignments PREALIGNMENTS [PREALIGNMENTS ...]
85-
List of reference genomes to align to before primary
86-
alignment.
87-
-V, --version show program's version number and exit'
24+
# run PEPATAC with different trimming tools then default trimmomatic, currectly supports skewer and pyadapt:
25+
python pipelines/pepatac.py --skewer TRUE -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C pepatac.yaml -gs hs -I $INPUT/pepatac_results_PE_R1.fastq.gz -I2 $INPUT/pepatac_results_PE_R2.fastq.gz
26+
python pipelines/pepatac.py --pyadapt TRUE -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C pepatac.yaml -gs hs -I $INPUT/pepatac_results_PE_R1.fastq.gz -I2 $INPUT/pepatac_results_PE_R2.fastq.gz
8827

28+
# re-run PEPATAC and over-write the previous output:
29+
python pipelines/pepatac.py -N -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C pepatac.yaml -gs hs -I $INPUT/pepatac_results_PE_R1.fastq.gz -I2 $INPUT/pepatac_results_PE_R2.fastq.gz
8930

31+
# continue to run PEPATAC since a locked step (usually locked due to failure):
32+
python pipelines/pepatac.py -R -P 5 -O output_folder -S output_sample_name -G hg38 -Q paired -C pepatac.yaml -gs hs -I $INPUT/pepatac_results_PE_R1.fastq.gz -I2 $INPUT/pepatac_results_PE_R2.fastq.gz
9033

examples/chang_project/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
This folder contains an skeleton template with configuration options already set for the Chang lab compute environment. To set up a new project in the Chang lab compute environment, follow these instructions:
44

5-
1. Follow the **installing** instructions in the main README to get prerequisites (install looper, pypiper, and clone the ATACseq repository).
5+
1. Follow the **installing** instructions in the main README to get prerequisites (install looper, pypiper, and clone the PEPATAC repository).
66
2. Copy this folder ([examples/chang_project](examples/chang_project/)) and name the new folder for your project.
7-
3. In your new folder, edit `project_config.yaml` to set the `metadata.pipelines_dir` option to the location of your cloned ATACseq repository.
7+
3. In your new folder, edit `project_config.yaml` to set the `metadata.pipelines_dir` option to the location of your cloned PEPATAC repository.
88
4. Edit `project_config.yaml` to set the `data_sources.R1` and `data_sources.R2` to point to where you store fastq files. Your files must be named in some systematic pattern that can be created by populating sample variables, like `{sample_name}`. Detailed instructions are available here: [using looper derived columns](http://looper.readthedocs.io/en/latest/advanced.html#pointing-to-flexible-data-with-derived-columns).
99
5. Make any other (optional) changes you want to `project_config.yaml`.
1010
6. Modify `project_annotation.csv` to include your sample list.
1111
7. Run the project with `looper run path/to/project_config.yaml`.
1212

13-
Essentially, all this does differently from the default is that we have provided a configuration file. See the `pipeline_config` section in the [project config file](examples/chang_project/project_config.yaml) -- we simply set this to `ATACseq_chang.yaml` for your project, and then include [ATACseq_chang.yaml](examples/chang_project/ATACseq_chang.yaml) parallel to the project config file.
13+
Essentially, all this does differently from the default is that we have provided a configuration file. See the `pipeline_config` section in the [project config file](examples/chang_project/project_config.yaml) -- we simply set this to `pepatac_chang.yaml` for your project, and then include [pepatac_chang.yaml](examples/chang_project/pepatac_chang.yaml) parallel to the project config file.
1414

1515
Once you have it set up, you have all the power of looper for your project. It's simple to submit to a cluster, summarize your results, clean, and monitor your project. You can find additional details on what you can do with this in the [looper docs](http://looper.readthedocs.io/).

examples/chang_project/ATACseq_chang.yaml renamed to examples/chang_project/pepatac_chang.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Configuration file for ATACseq pipeline based on pypiper
1+
# PEPATAC configuration file for an ATACseq pipeline based on pypiper
22

33
# basic tools
44
# public tools

examples/chang_project/project_config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,6 @@ implied_columns:
3333
prealignments: null
3434

3535
pipeline_config:
36-
ATACseq.py: ATACseq_chang.yaml # Use this to load Chang Lab settings
37-
#ATACseq.py: null # Use this to load default environment settings
36+
pepatac.py: pepatac_chang.yaml # Use this to load Chang Lab settings
37+
#pepatac.py: null # Use this to load default environment settings
3838

0 commit comments

Comments
 (0)