Skip to content

MergeVcfs and SortVcf lose NULL sample values when SAMPLEs in headers need to be reordered #9275

@DavidStreid

Description

@DavidStreid

Hi,
I ran into an issue where gatk MergeVcfs does not preserve NULL (.) SAMPLE values in the output VCF. This only seems to happen when the SAMPLES are not in alphabetical order in the VCF header and need to be sorted (e.g. ...FORMAT S2 S1 instead of ...FORMAT S1 S2). Please clarify if this is expected behavior or if this can be resolved so all NULL values are preserved. Thank you.

GATK Version: 4.6.2.0

Steps to Reproduce

See attached chr1.input.vcf.gz

  1. Create input chr1.vcf. NOTE: S2 and S1 are not alphabetically sorted in the header and SAMPLE S1 has a null AD value, which should be preserved in the merge
$ cat chr1.input.vcf
##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Observed allele depths">
##contig=<ID=chr1,length=248956422>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	S2	S1
chr1	100	1	A	G	10	PASS	.	GT	1/1	0/0
chr1	101	2	C	A	10	PASS	.	GT:AD	0/1:5,5	0/0:.
  1. Run MergeVcfs on chr1.vcf
$ gatk MergeVcfs -I chr1.input.vcf -O chr1.output.vcf
Using GATK jar /Users/dstreid/Downloads/gatk-4.6.2.0/gatk-package-4.6.2.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /Users/dstreid/Downloads/gatk-4.6.2.0/gatk-package-4.6.2.0-local.jar MergeVcfs -I chr1.input.vcf -O chr1.output.vcf
16:23:08.065 INFO  NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/dstreid/Downloads/gatk-4.6.2.0/gatk-package-4.6.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.dylib
16:23:08.087 WARN  NativeLibraryLoader - Unable to load libgkl_compression.dylib from native/libgkl_compression.dylib (/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib: dlopen(/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib, 0x0001): tried: '/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib' (no such file), '/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression10309948940611096282.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')))
16:23:08.089 INFO  NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/dstreid/Downloads/gatk-4.6.2.0/gatk-package-4.6.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.dylib
16:23:08.092 WARN  NativeLibraryLoader - Unable to load libgkl_compression.dylib from native/libgkl_compression.dylib (/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib: dlopen(/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib, 0x0001): tried: '/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib' (no such file), '/private/var/folders/dc/3r0_b2_x3cqb_8z33n0f97gw0000gp/T/dstreid/libgkl_compression15820731834065721956.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64')))
[Thu Sep 25 16:23:08 EDT 2025] MergeVcfs --INPUT chr1.input.vcf --OUTPUT chr1.output.vcf --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Thu Sep 25 16:23:08 EDT 2025] Executing as dstreid@MacBookPro on Mac OS X 15.5 aarch64; OpenJDK 64-Bit Server VM 17.0.16+0; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: Version:4.6.2.0
[Thu Sep 25 16:23:08 EDT 2025] picard.vcf.MergeVcfs done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=119537664
Tool returned:
0
  1. SAMPLE order in headers has been sorted alphabetically and S1 has lost its AD value
$ grep -E "CHROM|101" chr1*vcf  | cut -f1,2,9-11
chr1.input.vcf:#CHROM	POS	FORMAT	S2	S1
chr1.input.vcf:chr1	101	GT:AD	0/1:5,5	0/0:.
chr1.output.vcf:#CHROM	POS	FORMAT	S1	S2
chr1.output.vcf:chr1	101	GT:AD	0/0	0/1:5,5

Note, If S1 and S2 do not need to be re-ordered, this is not an issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions