Introduce methods for standardized callset evaluation #804

kjaisingh · 2025-04-25T20:23:39Z

Description

This PR is intended to provide a series of changes to methods used to do callset evaluations, including the following components:

Introduces a new WDL for callset evaluations.
Add stratification options to SVConcordance.wdl, which are now used in the callset evaluations.
Allow Vapor to be run on CRAM files stored in requester-pays buckets.
Provide option to modify the SV size limit for Vapor in the long-read comparisons.

Testing

The following two workspaces (DRAGEN-SV-Comparison and DRAGEN-SV-Evaluation) highlight successful use cases of each of the WDLs and scripts.
Validated all WDLs with womtool.

Pre-Merge Changes Required

Create a new callset evaluation WDL.
- Use the modified MakeGqRecalibratorTrainingSetFromPacBio as a base.
- Rename MakeGqRecalibratorTrainingSetFromPacBio to use a prefix of AllOfUs and deprecate it into the internal repository.
- Modify input parameters to be agnostic to the type of comparison needed: raw calls, joint callset, CNV sites or CNV calls.
- Add input parameters to optionally do Vapor preprocessing, if Vapor is required.
- Add input parameters to optionally standardize input VCFs.
- Remove differentiation between loose and strict modes.
- Integrate pre/post-processing functionality from all the various new scripts & WDLs into this.
Include a Python notebook for the analysis of results from the callset evaluation WDL, providing user-level parameters that are intended to govern how to process the data output by the evaluation WDL.
- Modify input parameters to be agnostic to the type of comparison needed: raw calls, joint callset, CNV sites or CNV calls.
- Use the pacbio_support_summary_table for Vapor results instead of the vapor_output_json.
- Generalize the stratification field definition.
- Include optional sections that can be run depending on what comparison mode we are in - e.g. short-read caller concordance, caller & depth only coverage, subsetting results to samples etc. Look through archived slides for all the different comparisons done for ideas for these.
Add documentation that outlines how to use the long-read evaluations.
Decide on the future of the DRAGEN-CNV standardizer (std_dragen_cnv.py) → if it is being included, add it to the standardizer options and documentation.
Remove automated sync of the MakeGqRecalibratorTrainingSetFromPacBio WDL to Dockstore.

…P to INS

kjaisingh added 30 commits November 14, 2024 13:06

Initial commit to test dockstore sync

e823d20

Initial work - WIP

6127ef1

Merge branch 'main' into kj_dragensv_benchmarking

c3df02c

Initial implementation of DragenStandardizer

4f24d0f

Added automated sync

9a7953d

Circumvented linting errors

c05f593

Initialized new std_dragen file

5792480

Updated WDL & standardizer to output std_dragen_vcf

06abb34

Resolved linting errors

40b7fb5

Modified WDLs across workflows to integrate dragen

9c457d1

Updated WDL input params

cce70ae

Modified dragen_std to print

9897ed0

Modified standardizer to align with manta

2fb4b7d

Python linting errors

37a3bc7

Added MATEID indexing to drop paired mates

dd0d1a5

Added indexing for vcf's without it

3d3133f

Modified vapor wdl to remove unnecessary ref inputs

6b4c425

Initial commit for PreprocessDragenVcf

284f18f

Removed irrelevant inputs from vapor WDLs

53d0c67

Added OAUTH_TOKEN to localize files

311048a

Initial commit for CombineVcfs

4fa2ba2

Minor differences

d7f6ada

Modified passing of arguments to SVCluster

8134dc6

Further formatting & naming changes

16c6641

Removed /src/ from script path

0327cc8

Modified to take in fai as well

e595de0

Added ref_dict to SVCluster call

5876c27

Added index files

9135045

Updated combinevcf WDL syntactically

0d2c9fb

Added index file to output of combinevcfs

1ff0ad2

kjaisingh added 12 commits April 25, 2025 16:41

Undo code comment to circumvent linting error

88bc3e8

Removed folder creation

18feaad

Merge branch 'main' into kj_callset_evaluations

2e5c30b

Updated MainVcfQc to optionally run duplicate identification

ae2f8cf

Minor change to reflect updated wdl via dockstore sync

95e2eaa

Updated wdl structure

1c26497

Added manta qual to standardizer

65d78a3

Modified fam script

7f71807

Undo change to family analysis script

74b4150

Updated to use optional sample ID list to subset pre concordance

f22089d

Cleaned up workflow definitions

aee48ec

Updated per sample WDL

c4f673c

kjaisingh added methods and removed enhancement New feature or request labels Jul 30, 2025

kjaisingh changed the title ~~Enable standardized callset evaluations in GATK-SV~~ Enable standardized callset evaluations Jul 31, 2025

kjaisingh changed the title ~~Enable standardized callset evaluations~~ Provide methods for standardized callset evaluation Jul 31, 2025

kjaisingh changed the title ~~Provide methods for standardized callset evaluation~~ Introduce methods for standardized callset evaluation Aug 14, 2025

kjaisingh added 13 commits September 29, 2025 10:16

Resolved merge conflicts

dd70bac

Trigger dockstore sync

af8692a

Rectified use of max_variant_size

1d437eb

Removed sorting in svconcordancesimple

56231b5

Updated workflow to use proper names from svconcordance.wdl

65eee68

Initial commit of concordance by sample

5a2a74e

Minor changes to workflow

e326cdb

Tar files upon completion

01fde62

Linting changes

8c5bbf1

Explicitly pass private sites/sample removal params

43b94e7

Added notebooks to branch

89b122a

Initial version of SVConcordanceBySample including conversion from DU…

329531b

…P to INS

Moved new task in Utils to bottom of file

7c787ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce methods for standardized callset evaluation #804

Introduce methods for standardized callset evaluation #804

Uh oh!

kjaisingh commented Apr 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Introduce methods for standardized callset evaluation #804

Are you sure you want to change the base?

Introduce methods for standardized callset evaluation #804

Uh oh!

Conversation

kjaisingh commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Pre-Merge Changes Required

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kjaisingh commented Apr 25, 2025 •

edited

Loading