-
Notifications
You must be signed in to change notification settings - Fork 24
Description
We've used MDT for a number of years with a lot of success, running on workstations with NVIDIA GPUs. Some time back, I'd posted about issues with OpenCL Intel processing not being multi-threaded. While the problem got solved in that we used many threads, the output is never close to what the GPU numbers are. Here's a screenshot showing a GPU-derived image on the left and a CPU / Singularity image on the right:
A bit of decoding -- the arrow is pointing to a section showing the values under the cursor across multiple runs. There are two NVIDIA runs on different machines that give identical results of 0.5777... (good). There's also a run on one of these machines setting the OpenCL device to be device1, which was 'CPU - pthread-Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz (Portable Computing Language)'. This comes very, very close (0.578) and I'll attribute to different floating point units. Now, the GPU run took <1 minute and the CPU one took 934 minutes on a 20-core machine, which was a bit insane, but at least the numbers lined up.
Moving on, though, to attempts to bundle this in Singularity. I've used the supplied script, my own script, and many variants on each. In the end, I get images that are either a constant 0.5 everywhere within the mask or images like the one on the right that show something of the brain, but whose values are way, way off -- 0.0256 in the example here. It's almost as if the values are being cast into the wrong format. We run quickly as the threading works well here, but this is clearly quite wrong. I'd love to be able to fix this, but I've thrown at it what I can think of and hit the wall.
