Skip to content

HaplotypeCaller compability problem on Windows #9293

@MarcoLotz

Description

@MarcoLotz

Status-quo

Currently GATK supports both MacOS and Linux. Direct Windows is not supported, without Docker.

While using it as a Java dependency for a multi-platform solution, I have noticed that HaplotypeCaller would not work on Windows because intervals were being interpreted as invalid paths:

e.g. an internal 1000:1000 would be interpreted as a path. The error being thrown was similar to the one described in #5805

Illegal char <:> at index 2: /C:/Users/User/AppData/Local/Temp/
java.nio.file.InvalidPathException: Illegal char <:> at index 2: /C:/Users/User/AppData/Local/Temp/
	at java.base/sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:199)
	at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:175)
	at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
	at java.base/sun.nio.fs.WindowsPath.parse(WindowsPath.java:92)
	at java.base/sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:231)
	at java.base/java.nio.file.Path.of(Path.java:148)
	at java.base/java.nio.file.Paths.get(Paths.java:69)
	at htsjdk.io.HtsPath.getURIForString(HtsPath.java:250)

I am using the following configuration:
gatk version: 4.6.1.0
windows version: 11
Java version: 21

Workaround:

The workaround solution is to write the intervals to a file and have the HaplotypeCaller reading the content of those files instead. The downside is of course increased Disk I/O and therefore smaller throughput.

if (IS_WINDOWS) {
        intervalFile = File.createTempFile("gatk_interval_", ".list");
        intervalFile.deleteOnExit();
        java.nio.file.Files.writeString(intervalFile.toPath(), interval + "\n");
        intervalArg = intervalFile.getAbsolutePath();
      } else {
        intervalArg = interval;
      }

      List<String> args = new ArrayList<>();
      args.add("--input");
      args.add(cramFile.getAbsolutePath());
      args.add("--reference");
      args.add(referenceFile.getAbsolutePath());
      args.add("--intervals");
      args.add(intervalArg);

Reasoning:

Coming from Spark background, I believe that supporting only Linux and MacOS decision is to focus engineering resources on providing consistent productive support for cluster workloads, which is absolutely on-point. It's likely, however, that users could want to do small-scale ad-hoc analysis (non-productive) on their windows machines without having to submit such small load to a cluster.

PR:

I am submitting a PR that addresses this shortcomings, bringing further stability for Windows usage. Whilst the goal is not providing extensive productive support of the solution on Windows, at least it should correct most Platform-related of the known bugs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions