-
Notifications
You must be signed in to change notification settings - Fork 619
Open
Description
Hi,
I ran into an issue where gatk MergeVcfs does not preserve NULL (.) SAMPLE values in the output VCF. This only seems to happen when the SAMPLES are not in alphabetical order in the VCF header and need to be sorted (e.g. ...FORMAT S2 S1 instead of ...FORMAT S1 S2). Please clarify if this is expected behavior or if this can be resolved so all NULL values are preserved. Thank you.
GATK Version: 4.6.2.0
Steps to Reproduce
See attached chr1.input.vcf.gz
- Create input
chr1.vcf. NOTE:S2andS1are not alphabetically sorted in the header and SAMPLES1has a null AD value, which should be preserved in the merge
$ cat chr1.input.vcf
##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Observed allele depths">
##contig=<ID=chr1,length=248956422>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S2 S1
chr1 100 1 A G 10 PASS . GT 1/1 0/0
chr1 101 2 C A 10 PASS . GT:AD 0/1:5,5 0/0:.
- Run MergeVcfs on chr1.vcf
$ gatk MergeVcfs -I chr1.input.vcf -O chr1.output.vcf
Using GATK jar /Users/dstreid/Downloads/gatk-4.6.2.0/gatk-package-4.6.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /Users/dstreid/Downloads/gatk-4.6.2.0/gatk-package-4.6.2.0-local.jar MergeVcfs -I chr1.input.vcf -O chr1.output.vcf
16:23:08.065 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/dstreid/Downloads/gatk-4.6.2.0/gatk-package-4.6.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.dylib
16:23:08.087 WARN NativeLibraryLoader - Unable to load libgkl_compression.dylib from native/libgkl_compression.dylib (/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib: dlopen(/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib, 0x0001): tried: '/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib' (no such file), '/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')))
16:23:08.089 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/dstreid/Downloads/gatk-4.6.2.0/gatk-package-4.6.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.dylib
16:23:08.092 WARN NativeLibraryLoader - Unable to load libgkl_compression.dylib from native/libgkl_compression.dylib (/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib: dlopen(/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib, 0x0001): tried: '/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib' (no such file), '/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')))
[Thu Sep 25 16:23:08 EDT 2025] MergeVcfs --INPUT chr1.input.vcf --OUTPUT chr1.output.vcf --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Thu Sep 25 16:23:08 EDT 2025] Executing as dstreid@MacBookPro on Mac OS X 15.5 aarch64; OpenJDK 64-Bit Server VM 17.0.16+0; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: Version:4.6.2.0
[Thu Sep 25 16:23:08 EDT 2025] picard.vcf.MergeVcfs done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=119537664
Tool returned:
0
- SAMPLE order in headers has been sorted alphabetically and S1 has lost its AD value
$ grep -E "CHROM|101" chr1*vcf | cut -f1,2,9-11
chr1.input.vcf:#CHROM POS FORMAT S2 S1
chr1.input.vcf:chr1 101 GT:AD 0/1:5,5 0/0:.
chr1.output.vcf:#CHROM POS FORMAT S1 S2
chr1.output.vcf:chr1 101 GT:AD 0/0 0/1:5,5
Note, If S1 and S2 do not need to be re-ordered, this is not an issue.
Metadata
Metadata
Assignees
Labels
No labels