Skip to content

Commit ad34491

Browse files
authored
Merge pull request #143 from databio/dev
Dev
2 parents 10e75d1 + 21144a2 commit ad34491

File tree

9 files changed

+406
-260
lines changed

9 files changed

+406
-260
lines changed

PEPATACr/R/PEPATACr.R

Lines changed: 303 additions & 216 deletions
Large diffs are not rendered by default.

docs/annotation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The pipeline uses reference data at various stages. If you're using a common genome assembly, these resources are pre-built and can be easily downloaded using `refgenie pull`, as described in the setup instructions. If the resources are not available, you'll have to build them. Read [how to build `refgenie` assets](http://refgenie.databio.org/en/latest/build/) in the `refgenie` docs. You may also [learn about the current buildable assets](http://refgenie.databio.org/en/latest/available_assets/) to which `refgenie` knows the recipe.
44

5-
##Use a custom `feat_annotation` asset
5+
## Use a custom `feat_annotation` asset
66

77
The pipeline will calculate the fraction of reads in genomic features using the `refgenie` `feat_annotation` asset, but you can also specify this file yourself at the command line (`--anno-name`).
88

docs/changelog.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
# Change log
22
All notable changes to this project will be documented in this file.
33

4+
## [0.9.2] -- 2020-08-04
5+
6+
### Changed
7+
- Reduce memory requirements of consensus peak generation
8+
- Enable multiple genome projects for peak count tables
9+
410
## [0.9.1] -- 2020-07-13
511

612
### Changed

docs/usage.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ usage: pepatac.py [-h] [-R] [-N] [-D] [-F] [-T] [--silent] [--verbosity V]
1919
[--peak-type {fixed,variable}] [--extend EXTEND]
2020
[--frip-ref-peaks FRIP_REF_PEAKS] [--motif] [--sob]
2121
[--no-scale] [--prioritize] [--keep] [--noFIFO] [--lite]
22-
[-V]
22+
[--skipqc] [-V]
2323
24-
PEPATAC version 0.9.1
24+
PEPATAC version 0.9.2
2525
2626
optional arguments:
2727
-h, --help show this help message and exit
@@ -86,6 +86,8 @@ optional arguments:
8686
--noFIFO Do NOT use named pipes during prealignments
8787
--lite Only keep minimal, essential output to conserve disk
8888
space.
89+
--skipqc Skip FastQC. Useful for bugs in FastQC that appear
90+
with some sequence read files.
8991
-V, --version show program's version number and exit
9092
9193
required named arguments:

mkdocs.yml

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@ theme: databio
22

33
site_name: PEPATAC
44
site_author: Jason Smith
5-
site_url: http://code.databio.org/PEPATAC/
5+
site_url: http://pepatac.databio.org/
66
site_logo: img/pepatac_logo_white.svg
77
repo_url: https://github.com/databio/pepatac/
8-
google_analytics: ['UA-127092878-1', 'code.databio.org/PEPATAC']
8+
google_analytics: ['UA-127092878-1', 'pepatac.databio.org']
99

1010
markdown_extensions:
1111
- fontawesome_markdown # pip install --user fontawesome-markdown
@@ -47,6 +47,3 @@ navbar:
4747
- text: Software & Data
4848
icon: fa-code fa-lg
4949
href: http://databio.org/software/
50-
# - text: Contact us
51-
# icon: fa-envelope
52-
# href: contact

pepatac_output_schema.yaml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,11 +52,12 @@ properties:
5252
consensus_peaks_file:
5353
title: "Consensus peaks file"
5454
description: "A set of consensus peaks across samples."
55-
thumbnail_path: "summary/{name}_consensusPeaks.png"
56-
path: "summary/{name}_consensusPeaks.narrowPeak"
57-
type: image
55+
thumbnail_path: "summary/{name}_*_consensusPeaks.png"
56+
path: "summary/{name}_*_consensusPeaks.narrowPeak"
57+
type: string
5858
counts_table:
5959
title: "Project peak coverage file"
6060
description: "Project peak coverages: chr_start_end X sample"
61-
path: "summary/{name}_peaks_coverage.tsv"
62-
type: link
61+
thumbnail_path: "summary/{name}_*_peaks_coverage.png"
62+
path: "summary/{name}_*_peaks_coverage.tsv"
63+
type: string

pipelines/pepatac.py

Lines changed: 51 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
__author__ = ["Jin Xu", "Nathan Sheffield", "Jason Smith"]
77
__email__ = "[email protected]"
8-
__version__ = "0.9.1"
8+
__version__ = "0.9.2"
99

1010

1111
from argparse import ArgumentParser
@@ -123,6 +123,10 @@ def parse_arguments():
123123
help="Only keep minimal, essential output to conserve "
124124
"disk space.")
125125

126+
parser.add_argument("--skipqc", dest="skipqc", action='store_true',
127+
help="Skip FastQC. Useful for bugs in FastQC "
128+
"that appear with some sequence read files.")
129+
126130
parser.add_argument("-V", "--version", action="version",
127131
version="%(prog)s {v}".format(v=__version__))
128132

@@ -712,6 +716,13 @@ def main():
712716
trimming_prefix = out_fastq_pre
713717
trimmed_fastq = trimming_prefix + "_R1_trim.fastq"
714718
trimmed_fastq_R2 = trimming_prefix + "_R2_trim.fastq"
719+
fastqc_folder = os.path.join(param.outfolder, "fastqc")
720+
fastqc_report = os.path.join(fastqc_folder,
721+
trimming_prefix + "_R1_trim_fastqc.html")
722+
fastqc_report_R2 = os.path.join(fastqc_folder,
723+
trimming_prefix + "_R2_trim_fastqc.html")
724+
if ngstk.check_command(tools.fastqc):
725+
ngstk.make_dir(fastqc_folder)
715726

716727
# Create trimming command(s).
717728
if args.trimmer == "pyadapt":
@@ -726,7 +737,7 @@ def main():
726737
("-o", out_fastq_pre),
727738
"-u"
728739
]
729-
cmd = build_command(trim_cmd_chunks)
740+
trim_cmd = build_command(trim_cmd_chunks)
730741

731742
elif args.trimmer == "skewer":
732743
# Create the primary skewer command.
@@ -762,7 +773,7 @@ def main():
762773
for old, new in skewer_filename_pairs]
763774

764775
# Pypiper submits the commands serially.
765-
cmd = [trimming_command] + trimming_renaming_commands
776+
trim_cmd = [trimming_command] + trimming_renaming_commands
766777

767778
else:
768779
# Default to trimmomatic.
@@ -780,13 +791,42 @@ def main():
780791
trimming_prefix + "_R2_unpaired.fq" if args.paired_end else "",
781792
"ILLUMINACLIP:" + res.adapters + ":2:30:10"
782793
]
783-
cmd = build_command(trim_cmd_chunks)
794+
trim_cmd = build_command(trim_cmd_chunks)
795+
796+
def check_trim():
797+
pm.info("Evaluating read trimming")
798+
799+
if args.paired_end and not trimmed_fastq_R2:
800+
pm.warning("Specified paired-end but no R2 file")
801+
802+
n_trim = float(ngstk.count_reads(trimmed_fastq, args.paired_end))
803+
pm.report_result("Trimmed_reads", int(n_trim))
804+
try:
805+
rr = float(pm.get_stat("Raw_reads"))
806+
except:
807+
pm.warning("Can't calculate trim loss rate without raw read result.")
808+
else:
809+
pm.report_result(
810+
"Trim_loss_rate", round((rr - n_trim) * 100 / rr, 2))
811+
812+
# Also run a fastqc (if installed/requested)
813+
if fastqc_folder and not args.skipqc:
814+
if fastqc_folder and os.path.isabs(fastqc_folder):
815+
ngstk.make_sure_path_exists(fastqc_folder)
816+
cmd = (tools.fastqc + " --noextract --outdir " +
817+
fastqc_folder + " " + trimmed_fastq)
818+
pm.run(cmd, fastqc_report, nofail=False)
819+
pm.report_object("FastQC report r1", fastqc_report)
820+
821+
if args.paired_end and trimmed_fastq_R2:
822+
cmd = (tools.fastqc + " --noextract --outdir " +
823+
fastqc_folder + " " + trimmed_fastq_R2)
824+
pm.run(cmd, fastqc_report_R2, nofail=False)
825+
pm.report_object("FastQC report r2", fastqc_report_R2)
784826

785827
if not os.path.exists(rmdup_bam) or args.new_start:
786-
pm.run(cmd, trimmed_fastq,
787-
follow=ngstk.check_trim(
788-
trimmed_fastq, args.paired_end, trimmed_fastq_R2,
789-
fastqc_folder=os.path.join(param.outfolder, "fastqc")))
828+
pm.debug("trim_cmd: {}".format(trim_cmd))
829+
pm.run(trim_cmd, trimmed_fastq, follow=check_trim)
790830

791831
pm.clean_add(os.path.join(fastq_folder, "*.fastq"), conditional=True)
792832
pm.clean_add(os.path.join(fastq_folder, "*.log"), conditional=True)
@@ -875,8 +915,9 @@ def main():
875915
pm.clean_add(tempdir)
876916

877917
# If there are no prealignments, unmap_fq1 will be unzipped
878-
if pypiper.is_gzipped_fastq(unmap_fq1):
918+
if os.path.exists(unmap_fq1 + ".gz"):
879919
unmap_fq1 = unmap_fq1 + ".gz"
920+
if os.path.exists(unmap_fq2 + ".gz"):
880921
unmap_fq2 = unmap_fq2 + ".gz"
881922

882923
bt2_index = os.path.join(rgc.seek(args.genome_assembly, BT2_IDX_KEY))
@@ -1742,7 +1783,7 @@ def report_peak_count():
17421783
black_local = os.path.join(raw_folder,
17431784
args.genome_assembly +
17441785
"_blacklist.bed")
1745-
cmd = ("ln -sf " + res.feat_annotation + " " + black_local)
1786+
cmd = ("ln -sf " + res.blacklist + " " + black_local)
17461787
pm.run(cmd, black_local)
17471788
else:
17481789
print("Skipping peak filtering...")

tools/PEPATAC_summarizer.R

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,6 @@ if (dir.exists(argv$results)) {
110110
quit()
111111
}
112112

113-
114113
# Get assets
115114
assets <- PEPATACr::createAssetsSummary(prj, argv$output, results_subdir)
116115
if (nrow(assets) == 0) {
@@ -172,7 +171,7 @@ if (!file.exists(complexity_path) || argv$new_start) {
172171
complexity_flag <- TRUE
173172
}
174173
} else {
175-
warning("Project level library complexity plot already exists.")
174+
message("Project level library complexity plot already exists.")
176175
message(paste0("Project library complexity plot: ", complexity_path, "\n"))
177176
complexity_flag <- TRUE
178177
}
@@ -185,15 +184,15 @@ if (summarizer_flag && complexity_flag) {
185184
for (genome in unique(genomes)) {
186185
file_name <- paste0("_", genome,"_consensusPeaks.narrowPeak")
187186
consensus_path <- file.path(summary_dir, paste0(project_name, file_name))
188-
if (file.exists(consensus_path)) {
187+
if (file.exists(consensus_path) && !argv$new_start) {
189188
message(paste0("Consensus peak set (", genome, "): ",
190189
consensus_path, "\n"))
191190
}
192191
}
193192

194193
# Calculate consensus peaks
195194
if (!file.exists(consensus_path) || argv$new_start) {
196-
write(paste0("Creating consensus peak set..."), stdout())
195+
#write(paste0("Creating consensus peak set..."), stdout())
197196
consensus_paths <- PEPATACr::consensusPeaks(prj, argv$output,
198197
argv$results, assets)
199198
if (!length(consensus_paths) == 0) {
@@ -213,24 +212,35 @@ if (!file.exists(consensus_path) || argv$new_start) {
213212
}
214213
}
215214

215+
# Report existing counts tables
216+
# TODO: move genome handling out of the called function?
217+
for (genome in unique(genomes)) {
218+
file_name <- paste0("_", genome,"_peaks_coverage.tsv")
219+
counts_path <- file.path(summary_dir, paste0(project_name, file_name))
220+
if (file.exists(counts_path) && !argv$new_start) {
221+
message(paste0("Peak counts table (", genome, "): ",
222+
counts_path, "\n"))
223+
}
224+
}
225+
216226
# Create count matrix
217-
counts_path <- file.path(summary_dir,
218-
paste0(project_name, "_peaks_coverage.tsv"))
219227
if (!file.exists(counts_path) || argv$new_start) {
220-
write(paste0("Creating gene count table..."), stdout())
221-
counts_path <- PEPATACr::peakCounts(prj, argv$output, argv$results, assets)
222-
if (!is.null(counts_path) && file.exists(counts_path)) {
223-
message("Counts table: ", counts_path, "\n")
224-
icon <- PEPATACr::fileIcon()
225-
output_file <- file.path(summary_dir,
226-
paste0(project_name, "_peaks_coverage.png"))
227-
png(filename = output_file, height = 275, width=275,
228-
bg="transparent")
229-
suppressWarnings(print(icon))
230-
invisible(dev.off())
228+
#write(paste0("Creating peak count table(s)..."), stdout())
229+
counts_paths <- PEPATACr::peakCounts(prj, argv$output, argv$results, assets)
230+
if (!length(counts_paths) == 0) {
231+
for (counts_table in counts_paths) {
232+
if (file.exists(counts_table)) {
233+
message("Counts table: ", counts_table, "\n")
234+
icon <- PEPATACr::fileIcon()
235+
output_file <- file.path(summary_dir,
236+
paste0(project_name, "_", genome, "_peaks_coverage.png"))
237+
png(filename = output_file, height = 275, width=275,
238+
bg="transparent")
239+
suppressWarnings(print(icon))
240+
invisible(dev.off())
241+
}
242+
}
231243
}
232-
} else {
233-
message("Counts table: ", counts_path, "\n")
234244
}
235245

236246
################################################################################

usage.txt

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ usage: pepatac.py [-h] [-R] [-N] [-D] [-F] [-T] [--silent] [--verbosity V]
1111
[--peak-type {fixed,variable}] [--extend EXTEND]
1212
[--frip-ref-peaks FRIP_REF_PEAKS] [--motif] [--sob]
1313
[--no-scale] [--prioritize] [--keep] [--noFIFO] [--lite]
14-
[-V]
14+
[--skipqc] [-V]
1515

16-
PEPATAC version 0.9.1
16+
PEPATAC version 0.9.2
1717

1818
optional arguments:
1919
-h, --help show this help message and exit
@@ -78,6 +78,8 @@ optional arguments:
7878
--noFIFO Do NOT use named pipes during prealignments
7979
--lite Only keep minimal, essential output to conserve disk
8080
space.
81+
--skipqc Skip FastQC. Useful for bugs in FastQC that appear
82+
with some sequence read files.
8183
-V, --version show program's version number and exit
8284

8385
required named arguments:

0 commit comments

Comments
 (0)