-
Notifications
You must be signed in to change notification settings - Fork 111
Description
I’d like to learn your experience with [WholeGenomeGermlineSingleSample v3.3.4] on Terra (https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbroadinstitute%2Fwarp%2FWholeGenomeGermlineSingleSample&data=05%7C02%7Cyixing.han%40nih.gov%7Cce867daae0f74de018f208ddf61d217a%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638937327154399259%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=zuRyjNb80RrtYI2mD2XtJRTRX3%2B5BCOJR74VMqWyf%2Fo%3D&reserved=0) workflow. I consistently encounter “out of memory” errors at the MarkDuplicates stage.
From my understanding, the memory_multiplier parameter controls this step. I have experimented with several values:
34, 68, 70, 80 → all returned out-of-memory errors.
100, 250, 300 → returned the error “Invalid value for field ‘resource.properties.machineType’”, which I believe indicates that GCP rejected the request due to excessive resource allocation.
Since I am working with large uBAM files (400 samples, total size is about 30 TB), I am unsure how best to configure these parameters to complete the workflow successfully. I have attached my current inputs.json file below for your reference.
Please advise on how to properly set the parameters (particularly memory and disk sizing) so that the workflow can run successfully on large inputs. I’d also be happy to provide any additional details that would help in troubleshooting.
I greatly appreciate any insight you can share.
input.json:
{“WholeGenomeGermlineSingleSample.CollectRawWgsMetrics.read_length”:“${151}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.ApplyBQSR.gatk_docker”:“${}”,“WholeGenomeGermlineSingleSample.BamToGvcf.make_bamout”:“${false}”,“WholeGenomeGermlineSingleSample.fingerprint_genotypes_file”:“gs://dsde-data-na12878-public/NA12878.hg38.reference.fingerprint.vcf”,“WholeGenomeGermlineSingleSample.CollectRawWgsMetrics.memory_multiplier”:“${4}”,“WholeGenomeGermlineSingleSample.references”:“${{“contamination_sites_ud”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.contam.UD”,“contamination_sites_bed”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.contam.bed”,“contamination_sites_mu”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.contam.mu”,“calling_interval_list”:“gs://gcp-public-data–broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list”,“reference_fasta”:{“ref_dict”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.dict”,“ref_fasta”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta”,“ref_fasta_index”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai”,“ref_alt”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt”,“ref_sa”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa”,“ref_amb”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb”,“ref_bwt”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt”,“ref_ann”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann”,“ref_pac”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac”},“known_indels_sites_vcfs”:[“gs://gcp-public-data–broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz”,“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz”],“known_indels_sites_indices”:[“gs://gcp-public-data–broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi”,“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi”],“dbsnp_vcf”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf”,“dbsnp_vcf_index”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx”,“evaluation_interval_list”:“gs://gcp-public-data–broad-references/hg38/v0/wgs_evaluation_regions.hg38.interval_list”,“haplotype_database_file”:“gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt”}}”,“WholeGenomeGermlineSingleSample.sample_and_unmapped_bams”:“${{ “sample_name”: this.sample_name_id, “base_file_name”: this.base_file_name, “flowcell_unmapped_bams”: this.flowcell_unmapped_bams, “final_gvcf_base_name”: this.final_gvcf_base_name, “unmapped_bam_suffix”: “.bam” }}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.SortSampleBam.memory_multiplier”:“${34}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.GatherBamFiles.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.MarkDuplicates.read_name_regex”:“${null}”,“WholeGenomeGermlineSingleSample.cloud_provider”:“gcp”,“WholeGenomeGermlineSingleSample.BamToGvcf.SortBamout.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.CollectRawWgsMetrics.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.ApplyBQSR.memory_multiplier”:“${8}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.BaseRecalibrator.gatk_docker”:“${}”,“WholeGenomeGermlineSingleSample.BamToGvcf.HaplotypeCallerGATK4.memory_multiplier”:“${8}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.ApplyBQSR.additional_disk”:“${1000}”,“WholeGenomeGermlineSingleSample.BamToGvcf.make_gvcf”:“${true}”,“WholeGenomeGermlineSingleSample.wgs_coverage_interval_list”:“gs://gcp-public-data–broad-references/hg38/v0/wgs_coverage_regions.hg38.interval_list”,“WholeGenomeGermlineSingleSample.BamToGvcf.SortBamout.memory_multiplier”:“${20}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.MarkDuplicates.additional_disk”:“${1500}”,“WholeGenomeGermlineSingleSample.papi_settings”:“${{“preemptible_tries”:3,“agg_preemptible_tries”:3}}”,“WholeGenomeGermlineSingleSample.BamToCram.ValidateCram.memory_multiplier”:“${4}”,“WholeGenomeGermlineSingleSample.scatter_settings”:“${{“haplotype_scatter_count”:50,“break_bands_at_multiples_of”:1000000}}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.MarkDuplicates.memory_multiplier”:“${80}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.GatherBamFiles.memory_multiplier”:“${4}”,“WholeGenomeGermlineSingleSample.UnmappedBamToAlignedBam.GatherBqsrReports.gatk_docker”:“${}”,“WholeGenomeGermlineSingleSample.AggregatedBamQC.CheckFingerprintTask.memory_size”:“${1000}”}