Skip to content

Commit 2dc9a83

Browse files
Merge pull request #1699 from broadinstitute/develop
dev -> staging
2 parents 41fb1d9 + 2a491d1 commit 2dc9a83

File tree

47 files changed

+476
-52
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+476
-52
lines changed

.dockstore.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ workflows:
1111

1212
- name: ArrayImputationQC
1313
subclass: WDL
14-
primaryDescriptorPath: /pipelines/broad/arrays/imputation_beagle/ArrayImputationQC.wdl
14+
primaryDescriptorPath: /pipelines/wdl/arrays/imputation_beagle/input_qc/ArrayImputationQC.wdl
1515

1616
- name: atac
1717
subclass: WDL
@@ -173,6 +173,10 @@ workflows:
173173
primaryDescriptorPath: /beta-pipelines/broad/Utilities/SplitMultisampleVCF/SplitMultisampleVCF.wdl
174174
readMePath: /beta-pipelines/broad/Utilities/SplitMultisampleVCF/README.md
175175

176+
- name: TestArrayImputationQC
177+
subclass: WDL
178+
primaryDescriptorPath: /verification/test-wdls/TestArrayImputationQC.wdl
179+
176180
- name: TestATAC
177181
subclass: WDL
178182
primaryDescriptorPath: /verification/test-wdls/TestATAC.wdl
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
name: Test ArrayImputationQC
2+
3+
# Controls when the workflow will run
4+
on:
5+
pull_request:
6+
branches: [ "develop", "staging", "master" ]
7+
# Only run if files in these paths changed:
8+
####################################
9+
# SET PIPELINE SPECIFIC PATHS HERE #
10+
####################################
11+
paths:
12+
- 'pipelines/wdl/arrays/imputation_beagle/input_qc/**'
13+
- 'tasks/wdl/ImputationBeagleQcTasks.wdl'
14+
- 'verification/VerifyArrayImputationQC.wdl'
15+
- 'verification/test-wdls/TestArrayImputationQC.wdl'
16+
- 'tasks/wdl/Utilities.wdl'
17+
- 'tasks/wdl/TerraCopyFilesFromCloudToCloud.wdl'
18+
- '.github/workflows/test_array_imputation_qc.yml'
19+
- '.github/workflows/warp_test_workflow.yml'
20+
21+
22+
# Allows you to run this workflow manually from the Actions tab
23+
workflow_dispatch:
24+
inputs:
25+
useCallCache:
26+
description: 'Use call cache (default: true)'
27+
required: false
28+
default: "true"
29+
updateTruth:
30+
description: 'Update truth files (default: false)'
31+
required: false
32+
default: "false"
33+
testType:
34+
description: 'Specify the type of test (Plumbing or Scientific)'
35+
required: false
36+
type: choice
37+
options:
38+
- Plumbing
39+
- Scientific
40+
truthBranch:
41+
description: 'Specify the branch for truth files (default: master)'
42+
required: false
43+
default: "master"
44+
45+
env:
46+
# pipeline configuration
47+
PIPELINE_NAME: TestArrayImputationQC
48+
DOCKSTORE_PIPELINE_NAME: ArrayImputationQC
49+
PIPELINE_DIR: "pipelines/wdl/arrays/imputation_beagle/input_qc"
50+
51+
# workspace configuration
52+
TESTING_WORKSPACE: WARP Tests
53+
WORKSPACE_NAMESPACE: warp-pipelines
54+
55+
# service account configuration
56+
SA_JSON_B64: ${{ secrets.PDT_TESTER_SA_B64 }}
57+
58+
59+
60+
jobs:
61+
TestArrayImputationQC:
62+
uses: ./.github/workflows/warp_test_workflow.yml
63+
with:
64+
pipeline_name: TestArrayImputationQC
65+
dockstore_pipeline_name: ArrayImputationQC
66+
pipeline_dir: pipelines/wdl/arrays/imputation_beagle/input_qc
67+
use_call_cache: ${{ github.event.inputs.useCallCache || 'true' }}
68+
update_truth: ${{ github.event.inputs.updateTruth || 'false' }}
69+
test_type: ${{ github.event.inputs.testType }}
70+
truth_branch: ${{ github.event.inputs.truthBranch || 'master' }}
71+
secrets:
72+
PDT_TESTER_SA_B64: ${{ secrets.PDT_TESTER_SA_B64 }}
73+
DOCKSTORE_TOKEN: ${{ secrets.DOCKSTORE_TOKEN }}

.github/workflows/test_imputation_beagle.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ on:
1010
####################################
1111
paths:
1212
- 'pipelines/wdl/arrays/imputation_beagle/**'
13+
- '!pipelines/wdl/arrays/imputation_beagle/input_qc/**'
1314
- 'structs/imputation/ImputationBeagleStructs.wdl'
1415
- 'tasks/wdl/ImputationTasks.wdl'
1516
- 'tasks/wdl/ImputationBeagleTasks.wdl'

all_of_us/phasing/AoU_VCF.py renamed to all_of_us/phasing/AoU_VCF_v9.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ def run_subprocess(cmd, errorMessage):
147147
((mt.filters.contains('LowQual')) | # VARIABLE!!! Probably a checkbox?
148148
(mt.filters.contains('NO_HQ_GENOTYPES')) | # VARIABLE!!! Probably a checkbox?
149149
(mt.filters.contains('ExcessHet'))) | # VARIABLE!!! Probably a checkbox?
150-
(mt.variant_qc.call_rate < 0.9) | # VARIABLE!!!
150+
(mt.variant_qc.call_rate < 0.95) | # VARIABLE!!!
151151
(mt.variant_qc.gq_stats.mean < 30.0), # VARIABLE!!! #### Changed from 30.0 to 1.0 ####
152152
keep=False)
153153

@@ -160,7 +160,7 @@ def run_subprocess(cmd, errorMessage):
160160
HC = mt_fil.infor.homozygote_count,
161161
AVSAD = mt_fil.average_variant_sum_AD))
162162

163-
fields_to_drop = ['filters', 'variant_qc', 'infor', 'maximum_variant_AC', 'defined_AD', 'average_variant_sum_AD']
163+
fields_to_drop = ['variant_qc', 'infor', 'maximum_variant_AC', 'defined_AD', 'average_variant_sum_AD']
164164

165165
mt_fil = mt_fil.drop(*fields_to_drop)
166166

@@ -173,6 +173,9 @@ def run_subprocess(cmd, errorMessage):
173173
##INFO=<ID=HC,Number=R,Type=Integer,Description="Number of homozygotes per allele. One element per allele, including the reference.">
174174
##INFO=<ID=AVSAD,Number=1,Type=Float,Description="Mean sum of allelic depts. Proxies DP.">
175175
##INFO=<ID=GQ,Number=1,Type=Float,Description="Mean Genotype Quality">
176+
##FILTER=<ID=LowQual,Description="Low quality score">
177+
##FILTER=<ID=ExcessHet,Description="Excess heteroygotes">
178+
##FILTER=<ID=NO_HQ_GENOTYPES,Description="No high-quality genotypes">
176179
"""
177180
vcf_metadata += "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t0000000000"
178181

@@ -214,4 +217,4 @@ def run_subprocess(cmd, errorMessage):
214217
fp.flush()
215218
fp.close()
216219

217-
run_subprocess(f"gcloud storage cp {local_fname2} {rep_url}", f"Error copying report - {local_fname2} to {rep_url}")
220+
run_subprocess(f"gcloud storage cp {local_fname2} {rep_url}", f"Error copying report - {local_fname2} to {rep_url}")
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# aou_9.0.0
2+
2025-10-02 (Date of Last Commit)
3+
4+
* This is the first version of the pipeline used for AoU v9 phasing

all_of_us/phasing/filter_and_qc_variants.wdl

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,11 @@ workflow RunAoUAnvilMergeFilterAndQc {
127127
# However, since this can be a lightweight VM, overriding is unlikely to be necessary.
128128
129129
# The docker to be used on the VM. This will need both Hail and Google Cloud SDK installed.
130-
String hail_docker = "gcr.io/broad-dsde-methods/aou-auxiliary/hail_dataproc_wdl:0.2.130"
130+
String hail_docker = "gcr.io/broad-dsde-methods/aou-auxiliary/hail_dataproc_wdl:0.2.134"
131131
}
132132

133+
String pipeline_version = "aou_9.0.0"
134+
133135
# Ensure that trailing slash is included in the output bucket path
134136
String output_bucket_path_with_trailing_slash = sub(output_bucket_path, "/$", "") + "/"
135137

@@ -182,7 +184,7 @@ task FilterAndQCVariants {
182184
String worker_machine_type = "n1-highmem-4"
183185
Int num_workers = 2
184186
Int num_preemptible_workers = 50
185-
Int time_to_live_minutes = 2880 # two days
187+
Int time_to_live_minutes = 5760 # four days
186188
RuntimeAttr? runtime_attr_override
187189
String gcs_subnetwork_name
188190

pipeline_versions.txt

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
ArrayImputationQC 1.2.1 2025-10-01
1+
ArrayImputationQC 1.2.2 2025-10-07
22
ArrayImputationQuotaConsumed 1.1.0 2025-09-29
33
BuildIndices 4.2.1 2025-09-26
44
CramToUnmappedBams 1.1.3 2024-08-02
5-
ExomeGermlineSingleSample 3.2.5 2025-08-11
6-
ExomeReprocessing 3.3.5 2025-08-11
7-
IlluminaGenotypingArray 1.12.25 2025-08-11
5+
ExomeGermlineSingleSample 3.2.6 2025-10-09
6+
ExomeReprocessing 3.3.6 2025-10-09
7+
IlluminaGenotypingArray 1.12.26 2025-10-09
88
Imputation 1.1.23 2025-10-03
9-
ImputationBeagle 2.2.2 2025-10-03
9+
ImputationBeagle 2.2.3 2025-10-07
1010
JointGenotyping 1.7.3 2025-08-11
1111
MultiSampleSmartSeq2SingleNucleus 2.2.2 2025-06-20
1212
Multiome 6.1.3 2025-08-15
@@ -15,14 +15,14 @@ PairedTag 2.1.9 2025-09-19
1515
PeakCalling 1.0.1 2025-08-11
1616
Pipeline Name Version Date of Last Commit
1717
RNAWithUMIsPipeline 1.0.19 2025-08-11
18-
ReblockGVCF 2.4.2 2025-08-11
18+
ReblockGVCF 2.4.3 2025-10-09
1919
SlideSeq 3.6.3 2025-06-20
2020
SlideTags 1.0.4 2025-10-03
2121
UltimaGenomicsJointGenotyping 1.2.3 2025-08-11
22-
UltimaGenomicsWholeGenomeCramOnly 1.1.1 2025-08-11
23-
UltimaGenomicsWholeGenomeGermline 1.2.0 2025-03-17
24-
VariantCalling 2.2.6 2025-08-11
25-
WholeGenomeGermlineSingleSample 3.3.5 2025-08-11
26-
WholeGenomeReprocessing 3.3.5 2025-08-11
22+
UltimaGenomicsWholeGenomeCramOnly 1.1.2 2025-10-09
23+
UltimaGenomicsWholeGenomeGermline 1.2.1 2025-10-09
24+
VariantCalling 2.2.7 2025-10-09
25+
WholeGenomeGermlineSingleSample 3.3.6 2025-10-09
26+
WholeGenomeReprocessing 3.3.6 2025-10-09
2727
atac 2.9.3 2025-09-19
2828
snm3C 4.1.1 2025-09-19

pipelines/wdl/arrays/imputation_beagle/ImputationBeagle.changelog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
# 2.2.3
2+
2025-10-07 (Date of Last Commit)
3+
4+
* Update input_qc_version to 1.2.2 to match latest changes in InputQC wdl
5+
16
# 2.2.2
27
2025-10-03 (Date of Last Commit)
38

pipelines/wdl/arrays/imputation_beagle/ImputationBeagle.wdl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ import "../../../../tasks/wdl/ImputationTasks.wdl" as tasks
55
import "../../../../tasks/wdl/ImputationBeagleTasks.wdl" as beagleTasks
66

77
workflow ImputationBeagle {
8-
String pipeline_version = "2.2.2"
9-
String input_qc_version = "1.2.1"
8+
String pipeline_version = "2.2.3"
9+
String input_qc_version = "1.2.2"
1010
String quota_consumed_version = "1.1.0"
1111

1212
input {

pipelines/wdl/arrays/imputation_beagle/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -97,13 +97,15 @@ Upon successful completion, the workflow produces a fully imputed multi-sample V
9797

9898
## ArrayImputationQuotaConsumed summary
9999

100-
The ArrayImputationQuotaConsumed pipeline is used by the All of Us/AnVIL Imputation Service and calculates the number of samples in the input multi-sample VCF, which is the metric used by the service for ImputationBeagle pipeline quota.
100+
The ArrayImputationQuotaConsumed pipeline is used by the _All of Us_ + AnVIL Imputation Service and calculates the number of samples in the input multi-sample VCF, which is the metric used by the service for ImputationBeagle (array_imputation) pipeline quota.
101101

102102

103103
## ArrayImputationQC summary
104104

105-
The ArrayImputationQuotaConsumed pipeline is used by the All of Us/AnVIL Imputation Service and runs various qc checks on the input multi-sample VCF, checks include:
105+
The ArrayImputationQC pipeline is used by the _All of Us_ + AnVIL Imputation Service and runs various Quality Control checks on the input multi-sample VCF. Checks include:
106106
- vcf version 4.x
107107
- hg38 chromosome names
108-
- variants on each of the hg38 canonical chromosomes.
109-
- is not a WGS array vcf
108+
- variants on at least one of the hg38 canonical chromosomes
109+
- is not a WGS array vcf (has no more than 10 million records)
110+
- sorted positions
111+
- BGZF compression

0 commit comments

Comments
 (0)