Skip to content

Conversation

@kjaisingh
Copy link
Collaborator

@kjaisingh kjaisingh commented Sep 9, 2025

Description

This PR is intended to eliminate stochasticity in the MedianCov workflow by including a default seed of 42 when randomly subsampling, which occurs in two places:

  • In covPerSample(), we downsample to 1M bins if the input matrix has more bins than this.
  • In covPerBin(), we downsampling to 500 samples if there are many samples for a given bin.

Testing

  • The following workspace contains 3 distinct, uncached runs of EvidenceQc, a workflow which calls upon MedianCov.
    • Note that in each, the bincov_median output file is identical, despite there being over 29M bins in the input file.
    • We know from past experience that the NA20320 sample has often switched between a coverage value of 19 and 20.
  • Validated all WDLs with womtool.

@kjaisingh kjaisingh added the enhancement New feature or request label Sep 9, 2025
@kjaisingh kjaisingh self-assigned this Sep 9, 2025
@kjaisingh kjaisingh marked this pull request as ready for review September 10, 2025 17:57
@mwalker174
Copy link
Collaborator

This is useful but let's hold off on this for now

@kjaisingh kjaisingh removed their assignment Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants