Open
Description
I am not sure if this is an expected behavior, but it seems that sum(vec(A))
could be much faster than sum(A)
when A
is a multi-dimensional array. Please see https://discourse.julialang.org/t/summing-a-vector-is-faster-than-summing-a-multi-dimensional-array-of-the-same-length-using-cuda/116711 for an MWE. Although this post is from about a year ago, I can reproduce matching results with CUDA v5.7.3.