Skip to content
Open
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
c7701d4
update dyanmic truth paths
nikellepetrillo Mar 19, 2025
153c27f
update buckekt names
nikellepetrillo Mar 19, 2025
9913d47
update buckekt names
nikellepetrillo Mar 19, 2025
8d2b80c
update buckekt names
nikellepetrillo Mar 19, 2025
2d4021c
update buckekt names
nikellepetrillo Mar 19, 2025
ed60c07
update buckekt names
nikellepetrillo Mar 19, 2025
316e448
add warp-pipeline-dev as billing project
nikellepetrillo Mar 19, 2025
244fece
try to fix results path
nikellepetrillo Mar 19, 2025
aaec3d4
try to fix results path
nikellepetrillo Mar 19, 2025
1c801dd
try to fix results path
nikellepetrillo Mar 19, 2025
2abe71f
fix truth path
nikellepetrillo Mar 20, 2025
c6d2b26
fix truth path
nikellepetrillo Mar 20, 2025
7a3de2d
fix truth path
nikellepetrillo Mar 20, 2025
10663a2
fix truth path
nikellepetrillo Mar 20, 2025
94647ad
add --billing-project=warp-pipeline-dev
nikellepetrillo Mar 20, 2025
e031748
--billing-project=terra-f8e3de20
nikellepetrillo Mar 20, 2025
daba981
cleaning up extra debugging code
nikellepetrillo Mar 20, 2025
92e3742
cleaning up extra debugging code
nikellepetrillo Mar 20, 2025
b388e0e
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Mar 20, 2025
6707aee
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Mar 20, 2025
7a4e1a0
update slideseq inputs to private bucket to test out dynamic truth fu…
nikellepetrillo Mar 21, 2025
100b0e4
Merge remote-tracking branch 'origin/np_update_truth_paths' into np_u…
nikellepetrillo Mar 21, 2025
2eef91b
revert
nikellepetrillo Mar 21, 2025
755636f
update optimus sci inputs
nikellepetrillo Mar 21, 2025
551e1e9
update plumbing multi snss2
nikellepetrillo Apr 9, 2025
8481de5
update slideseq sci and plumbing
nikellepetrillo Apr 9, 2025
5346ce3
update snm3c
nikellepetrillo Apr 14, 2025
e37e1fd
Merge branch 'develop' into np_snm3c_slideseq_ss2sn_update_inputs
nikellepetrillo Apr 14, 2025
8c35fd0
update snm3c
nikellepetrillo Apr 14, 2025
334dfba
updates
nikellepetrillo Apr 15, 2025
0f93c2d
Merge branch 'develop' into np_snm3c_slideseq_ss2sn_update_inputs
nikellepetrillo Apr 15, 2025
03a519d
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Apr 15, 2025
5516cb1
revert
nikellepetrillo Apr 15, 2025
71e54f8
Merge pull request #1565 from broadinstitute/np_snm3c_slideseq_ss2sn_…
nikellepetrillo Apr 15, 2025
755170e
update sn3mc
nikellepetrillo Apr 16, 2025
0fe8557
update sn3mc
nikellepetrillo Apr 16, 2025
5faaaac
update sn3mc
nikellepetrillo Apr 16, 2025
9a6175e
update sn3mc
nikellepetrillo Apr 16, 2025
c8fa692
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Apr 16, 2025
96809d1
fix paired tag
nikellepetrillo Apr 17, 2025
03c109b
Merge remote-tracking branch 'origin/np_update_truth_paths' into np_u…
nikellepetrillo Apr 17, 2025
3352429
fix paired tag
nikellepetrillo Apr 17, 2025
29cacfa
fix atac
nikellepetrillo Apr 17, 2025
66a7f65
update sci snm3c
nikellepetrillo Apr 17, 2025
ec20e2f
update sci snm3c
nikellepetrillo Apr 17, 2025
f3dc83f
update atac
nikellepetrillo Apr 18, 2025
56ab9d6
update imputation and imp. beagle
nikellepetrillo Apr 18, 2025
1ece649
Merge pull request #1577 from broadinstitute/np_imputation_update_tru…
nikellepetrillo Apr 22, 2025
5f4c5f3
update paired tag
nikellepetrillo Apr 22, 2025
52b7bc6
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Apr 22, 2025
c4fb6e9
update imputation
nikellepetrillo Apr 22, 2025
7c41901
Merge remote-tracking branch 'origin/np_update_truth_paths' into np_u…
nikellepetrillo Apr 22, 2025
24a1584
update imputation
nikellepetrillo Apr 22, 2025
837fc79
Np update wgs np update truth paths (#1588)
nikellepetrillo May 27, 2025
487fe81
Migrate ExomegermlineSS inputs (#1595)
nikellepetrillo Jun 4, 2025
ab5c90f
update sample name maps
nikellepetrillo Jul 2, 2025
5015700
revert bge
nikellepetrillo Jul 3, 2025
282e757
revert gather_vcfs_high_memory.json
nikellepetrillo Jul 4, 2025
7c57915
revert gather_vcfs_high_memory.json
nikellepetrillo Jul 7, 2025
6b1d0b3
fix bge sci
nikellepetrillo Jul 9, 2025
490b171
update ug pipelines
nikellepetrillo Jul 9, 2025
3fbb03b
Merge pull request #1627 from broadinstitute/np_jg_np_update_truth_paths
nikellepetrillo Jul 10, 2025
39d262d
add gs://gatk-best-practices to public bucket identifiers
nikellepetrillo Jul 11, 2025
c8a89ef
forgot some instances of broad-gotc
nikellepetrillo Jul 14, 2025
1066cdd
fix VerifyNA12878.wdl
nikellepetrillo Jul 14, 2025
4ceda45
Merge pull request #1628 from broadinstitute/ultima_np_update_input_p…
nikellepetrillo Jul 15, 2025
8aa15f2
Merge pull request #1631 from broadinstitute/np_jg_input_fix
nikellepetrillo Jul 15, 2025
4ac288d
np update Reprocessing input paths (#1636)
nikellepetrillo Jul 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,10 @@ workflows:
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestPairedTag.wdl

- name: TestPeakCalling
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestPeakCalling.wdl

- name: TestReblockGVCF
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestReblockGVCF.wdl
Expand Down
65 changes: 65 additions & 0 deletions .github/workflows/test_peakcalling.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
name: Test PeakCalling

# Controls when the workflow will run
on:
pull_request:
branches: [ "develop", "staging", "master" ]
# Only run if files in these paths changed:
####################################
# SET PIPELINE SPECIFIC PATHS HERE #
####################################
paths:
# anything in the pipelines folder
- 'pipelines/skylab/peak_calling/**'
# tasks from the pipeline WDL and their dependencies
- 'tasks/broad/Utilities.wdl'
# verification WDL and its dependencies
- 'verification/VerifyPeakCalling.wdl'
- 'verification/VerifyTasks.wdl'
# test WDL and its dependencies
- 'verification/test-wdls/TestPeakCalling.wdl'
- 'tasks/broad/TerraCopyFilesFromCloudToCloud.wdl'
# this file, the subworkflow file, and the firecloud_api script
- '.github/workflows/test_peakcalling.yml'
- '.github/workflows/warp_test_workflow.yml'
- 'scripts/firecloud_api/firecloud_api.py'


# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
useCallCache:
description: 'Use call cache (default: true)'
required: false
default: "true"
updateTruth:
description: 'Update truth files (default: false)'
required: false
default: "false"
testType:
description: 'Specify the type of test (Plumbing or Scientific)'
required: false
type: choice
options:
- Plumbing
- Scientific
truthBranch:
description: 'Specify the branch for truth files (default: master)'
required: false
default: "master"


jobs:
TestPeakCalling:
uses: ./.github/workflows/warp_test_workflow.yml
with:
pipeline_name: TestPeakCalling
dockstore_pipeline_name: PeakCalling
pipeline_dir: pipelines/skylab/peak_calling
use_call_cache: ${{ github.event.inputs.useCallCache || 'true' }}
update_truth: ${{ github.event.inputs.updateTruth || 'false' }}
test_type: ${{ github.event.inputs.testType }}
truth_branch: ${{ github.event.inputs.truthBranch || 'master' }}
secrets:
PDT_TESTER_SA_B64: ${{ secrets.PDT_TESTER_SA_B64 }}
DOCKSTORE_TOKEN: ${{ secrets.DOCKSTORE_TOKEN }}
10 changes: 5 additions & 5 deletions .github/workflows/warp_test_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,7 @@ jobs:
done

echo "Error: The Dockstore Commit Hash does not match the GitHub Commit Hash after 15 minutes of retries!"

exit 1
env:
GITHUB_COMMIT_HASH: ${{ env.GITHUB_COMMIT_HASH }}
Expand Down Expand Up @@ -249,19 +250,18 @@ jobs:
TEST_TYPE="${{ env.testType }}"
INPUTS_DIR="${{ inputs.pipeline_dir }}/test_inputs/$TEST_TYPE"
echo "Running tests with test type: $TEST_TYPE"

TRUTH_PATH="gs://broad-gotc-test-storage/${{ inputs.dockstore_pipeline_name }}/truth/$(echo "$TEST_TYPE" | tr '[:upper:]' '[:lower:]')/$TRUTH_BRANCH"
echo "Truth path: $TRUTH_PATH"

RESULTS_PATH="gs://pd-test-results/${{ inputs.dockstore_pipeline_name }}/results/$CURRENT_TIME"


# Submit all jobs first and store their submission IDs
for input_file in "$INPUTS_DIR"/*.json; do
test_input_file=$(python3 scripts/firecloud_api/UpdateTestInputs.py --truth_path "$TRUTH_PATH" \
test_input_file=$(python3 scripts/firecloud_api/UpdateTestInputs.py \
--results_path "$RESULTS_PATH" \
--inputs_json "$input_file" \
--update_truth "$UPDATE_TRUTH_BOOL" \
--branch_name "$BRANCH_NAME" )
--branch_name "$TRUTH_BRANCH" \
--dockstore_pipeline_name "${{ inputs.dockstore_pipeline_name }}" )
echo "Uploading the test input file: $test_input_file"

# Create the submission_data.json file for this input_file
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

### WDL Analysis Research Pipelines
### Warp Analysis Research Pipelines

The WDL Analysis Research Pipelines (WARP) repository is a collection of cloud-optimized pipelines for processing biological data from the Broad Institute Data Sciences Platform and collaborators.
The Warp Analysis Research Pipelines (WARP) repository is a collection of cloud-optimized pipelines for processing biological data from the Broad Institute Data Sciences Platform and collaborators.

WARP provides robust, standardized data analysis for the Broad Institute Genomics Platform and large consortia like the Human Cell Atlas and the BRAIN Initiative. WARP pipelines are rigorously scientifically validated, high scale, reproducible and open source, released under the [BSD 3-Clause license](https://github.com/broadinstitute/warp/blob/master/LICENSE).

Expand Down
Empty file.
80 changes: 64 additions & 16 deletions beta-pipelines/skylab/slidetags/SlideTags.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -11,42 +11,40 @@ workflow SlideTags {
input {
# slide-tags inputs
String id
Array[String] fastq_paths
Array[String] spatial_fastq
Array[String] pucks
Array[String] rna_paths
String sb_path

# Optimus Inputs
String cloud_provider = "gcp"
String input_id
Int expected_cells = 3000 ## copied from Multiome ?
String counting_mode = "sn_rna"
Array[File] gex_r1_fastq
Array[File] gex_r2_fastq
Array[File]? gex_i1_fastq
File tar_star_reference
File annotations_gtf
File? mt_genes
File gex_whitelist
String cloud_provider = "gcp"
String input_id
Int expected_cells = 3000
String counting_mode = "sn_rna"
Int tenx_chemistry_version = 3
Int emptydrops_lower = 100
Boolean force_no_check = false
Boolean ignore_r1_read_length = false
String star_strand_mode = "Reverse"
Boolean count_exons = false
File gex_whitelist
String? soloMultiMappers
String? gex_nhash_id
File? mt_genes

String docker = "us.gcr.io/broad-gotc-prod/slide-tags:1.1.0"
}

parameter_meta {
fastq_paths: "Array of paths to spatial fastq files"
spatial_fastq: "Array of paths to spatial fastq files"
pucks: "Array of paths to puck files"
docker: "Docker image to use"
}

# Call the Optimus workflow
# Call the optimus workflow
call optimus.Optimus as Optimus {
input:
cloud_provider = cloud_provider,
Expand All @@ -70,19 +68,69 @@ workflow SlideTags {
soloMultiMappers = soloMultiMappers,
gex_expected_cells = expected_cells
}

call SpatialCount.count as spatial_count {
input:
fastq_paths = fastq_paths,
fastq_paths = spatial_fastq,
pucks = pucks,
docker = docker
docker = docker,
input_id = input_id
}

call Positioning.generate_positioning as positioning {
input:
rna_paths = rna_paths,
rna_paths = [Optimus.h5ad_output_file, Optimus.library_metrics],
sb_path = spatial_count.sb_counts,
docker = docker
docker = docker,
input_id = input_id
}

output {
# Version of Optimus pipeline
String optimus_pipeline_version_out = Optimus.pipeline_version_out
File optimus_genomic_reference_version = Optimus.genomic_reference_version

# Optimus Metrics outputs
File optimus_cell_metrics = Optimus.cell_metrics
File optimus_gene_metrics = Optimus.gene_metrics
File? optimus_cell_calls = Optimus.cell_calls

# Optimus Star outputs
File optimus_library_metrics = Optimus.library_metrics
File optimus_bam = Optimus.bam
File optimus_matrix = Optimus.matrix
File optimus_matrix_row_index = Optimus.matrix_row_index
File optimus_matrix_col_index = Optimus.matrix_col_index
File? optimus_aligner_metrics = Optimus.aligner_metrics
File? optimus_mtx_files = Optimus.mtx_files
File? optimus_filtered_mtx_files = Optimus.filtered_mtx_files
File? optimus_multimappers_EM_matrix = Optimus.multimappers_EM_matrix
File? optimus_multimappers_Uniform_matrix = Optimus.multimappers_Uniform_matrix
File? optimus_multimappers_Rescue_matrix = Optimus.multimappers_Rescue_matrix
File? optimus_multimappers_PropUnique_matrix = Optimus.multimappers_PropUnique_matrix

# Optimus H5ad
File optimus_h5ad_output_file = Optimus.h5ad_output_file

# Optimus Cellbender outputs
File? cb_cell_barcodes_csv = Optimus.cell_barcodes_csv
File? cb_checkpoint_file = Optimus.checkpoint_file
Array[File]? cb_h5_array = Optimus.h5_array
Array[File]? cb_html_report_array = Optimus.html_report_array
File? cb_log = Optimus.log
Array[File]? cb_metrics_csv_array = Optimus.metrics_csv_array
String? cb_output_directory = Optimus.output_directory
File? cb_summary_pdf = Optimus.summary_pdf

# Spatial/Positioning outputs
File spatial_output_h5 = spatial_count.sb_counts
File spatial_output_log = spatial_count.spatial_log
File positioning_seurat_qs = positioning.seurat_qs
File positioning_coords_csv = positioning.coords_csv
File positioning_coords2_csv = positioning.coords2_csv
File positioning_summary_pdf = positioning.summary_pdf
File positioning_intermediates = positioning.intermediates_file
File positioning_positioning_log = positioning.positioning_log
}

}
Expand Down
42 changes: 30 additions & 12 deletions beta-pipelines/skylab/slidetags/scripts/positioning.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ task generate_positioning {
input {
Array[String] rna_paths
String sb_path
String input_id
Int mem_GiB = 128
Int disk_GiB = 128
Int nthreads = 16
Expand All @@ -13,15 +14,17 @@ task generate_positioning {
set -euo pipefail
set -x
echo "<< starting spatial-count >>"

Rscript -e "install.packages(c('optparse', 'BiocManager'), repos='https://cloud.r-project.org'); BiocManager::install('IRanges')"

gcloud config set storage/process_count 16 # is this set by user?
gcloud config set storage/thread_count 2 # is this set by user?

# Download the scripts -- these need to be changed -- also need to add to docker
wget https://raw.githubusercontent.com/MacoskoLab/Macosko-Pipelines/ee005109446f58764509ee47ff51c212ce8dabe3/positioning/positioning.R
wget https://raw.githubusercontent.com/MacoskoLab/Macosko-Pipelines/6a78716aa08a9f2506c06844f7e3fd491b03aa8b/positioning/load_matrix.R
wget https://raw.githubusercontent.com/MacoskoLab/Macosko-Pipelines/a7fc86abbdd3d46461c500e7d024315d88a97e9a/positioning/run-positioning.R
# Download the scripts -- these need to be changed -- also need to add to docker
wget https://raw.githubusercontent.com/MacoskoLab/Macosko-Pipelines/d89176cf21e072fe8b5aad3a1454ad194fca7c9a/slide-tags/run-positioning.R
wget https://raw.githubusercontent.com/MacoskoLab/Macosko-Pipelines/d89176cf21e072fe8b5aad3a1454ad194fca7c9a/slide-tags/positioning.R
wget https://raw.githubusercontent.com/MacoskoLab/Macosko-Pipelines/d89176cf21e072fe8b5aad3a1454ad194fca7c9a/slide-tags/helpers.R

echo "RNA: ~{sep=' ' rna_paths}"
echo "SB: ~{sb_path}"

Expand All @@ -47,12 +50,12 @@ task generate_positioning {

# Download the SB
echo "Downloading SB:"
mkdir SB
gcloud storage cp ~{sb_path} SB
gcloud storage cp ~{sb_path} .
baseSB=`basename ~{sb_path}`

# Run the script
echo ; echo "Running run-positioning.R"
Rscript run-positioning.R RNA SB output
Rscript run-positioning.R RNA $baseSB output

# Upload the results
ls output/*
Expand All @@ -65,19 +68,34 @@ task generate_positioning {

echo; echo "Writing logs:"
echo; echo "RNA size:"; du -sh RNA
echo; echo "SB size:"; du -sh SB
echo; echo "SB size:"; du -sh $baseSB
echo; echo "output size:"; du -sh output
echo; echo "FREE SPACE:"; df -h

echo "tar files/logs"
cat stdout stderr > positioning.log
tar -zcvf output.tar.gz output

# Rename and move files
mv output/* .
mv summary.pdf ~{input_id}_summary.pdf
mv seurat.qs ~{input_id}_seurat.qs
mv coords.csv ~{input_id}_coords.csv
mv coords2.csv ~{input_id}_coords2.csv

ls
tar -zcvf output.tar.gz matrix.csv.gz cb_whitelist.txt spatial_metadata.json
mv output.tar.gz ~{input_id}_intermediates.tar.gz
mv positioning.log ~{input_id}_positioning.log
echo "<< completed positioning >>"
>>>

output {
File output_file = "output.tar.gz"
File positioning_log = "positioning.log"
File seurat_qs = "~{input_id}_seurat.qs"
File coords_csv = "~{input_id}_coords.csv"
File coords2_csv = "~{input_id}_coords2.csv"
File summary_pdf = "~{input_id}_summary.pdf"
File intermediates_file = "~{input_id}_intermediates.tar.gz"
File positioning_log = "~{input_id}_positioning.log"
}

runtime {
Expand Down
18 changes: 11 additions & 7 deletions beta-pipelines/skylab/slidetags/scripts/spatial-count.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ task count {
input {
Array[String] fastq_paths
Array[String] pucks
String input_id
Int mem_GiB = 64
Int disk_GiB = 128
Int nthreads = 1
Expand All @@ -18,8 +19,9 @@ task count {
gcloud config set storage/process_count 16
gcloud config set storage/thread_count 2

# Download the script -- put this script into a docker
wget https://raw.githubusercontent.com/MacoskoLab/Macosko-Pipelines/5c74e9e6148102081827625b9ce91ec2b7ba3541/spatial-count/spatial-count.jl
# Download the script -- put this script into a docker
# This needs to be changed when Matthew finalizes changes to these scripts
wget https://raw.githubusercontent.com/MacoskoLab/Macosko-Pipelines/d89176cf21e072fe8b5aad3a1454ad194fca7c9a/slide-tags/spatial-count.jl

echo "FASTQs: ~{length(fastq_paths)} paths provided"
echo "Pucks: ~{length(pucks)} puck(s) provided"
Expand Down Expand Up @@ -69,16 +71,18 @@ task count {
echo; echo "pucks size:"; du -sh pucks
echo; echo "output size:"; du -sh SBcounts.h5
echo; echo "FREE SPACE:"; df -h

cat stdout stderr > spatial-count.log

mv SBcounts.h5 ~{input_id}_SBcounts.h5
cat stdout stderr > ~{input_id}_spatial-count.log
echo "<< completed spatial-count >>"

>>>

output {
File sb_counts = "SBcounts.h5"
File spatial_log = "spatial-count.log"

File sb_counts = "~{input_id}_SBcounts.h5"
File spatial_log = "~{input_id}_spatial-count.log"
}

runtime {
docker: docker
memory: "~{mem_GiB} GB"
Expand Down
Loading
Loading