-
Notifications
You must be signed in to change notification settings - Fork 621
Description
Status-quo
Currently GATK supports both MacOS and Linux. Direct Windows is not supported, without Docker.
While using it as a Java dependency for a multi-platform solution, I have noticed that HaplotypeCaller would not work on Windows because intervals were being interpreted as invalid paths:
e.g. an internal 1000:1000 would be interpreted as a path. The error being thrown was similar to the one described in #5805
Illegal char <:> at index 2: /C:/Users/User/AppData/Local/Temp/
java.nio.file.InvalidPathException: Illegal char <:> at index 2: /C:/Users/User/AppData/Local/Temp/
at java.base/sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:199)
at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:175)
at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
at java.base/sun.nio.fs.WindowsPath.parse(WindowsPath.java:92)
at java.base/sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:231)
at java.base/java.nio.file.Path.of(Path.java:148)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at htsjdk.io.HtsPath.getURIForString(HtsPath.java:250)I am using the following configuration:
gatk version: 4.6.1.0
windows version: 11
Java version: 21
Workaround:
The workaround solution is to write the intervals to a file and have the HaplotypeCaller reading the content of those files instead. The downside is of course increased Disk I/O and therefore smaller throughput.
if (IS_WINDOWS) {
intervalFile = File.createTempFile("gatk_interval_", ".list");
intervalFile.deleteOnExit();
java.nio.file.Files.writeString(intervalFile.toPath(), interval + "\n");
intervalArg = intervalFile.getAbsolutePath();
} else {
intervalArg = interval;
}
List<String> args = new ArrayList<>();
args.add("--input");
args.add(cramFile.getAbsolutePath());
args.add("--reference");
args.add(referenceFile.getAbsolutePath());
args.add("--intervals");
args.add(intervalArg);Reasoning:
Coming from Spark background, I believe that supporting only Linux and MacOS decision is to focus engineering resources on providing consistent productive support for cluster workloads, which is absolutely on-point. It's likely, however, that users could want to do small-scale ad-hoc analysis (non-productive) on their windows machines without having to submit such small load to a cluster.
PR:
I am submitting a PR that addresses this shortcomings, bringing further stability for Windows usage. Whilst the goal is not providing extensive productive support of the solution on Windows, at least it should correct most Platform-related of the known bugs.