-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Currently the leading dimension for the descriptor is calculated as ny+2 see for example calls such as:
! The descriptor has shape (ny + 2, nx, nz) for R2C output
call c_f_pointer(descriptor%data(1), d_dev, &
[self%ny_loc + 2, self%nx_loc, self%nz_loc])For even ny, the expressions ny+2 and 2*(ny/2 + 1 are identical. But for odd ny they diverge:
ny = 65
2*(ny/2+1) = 2*(65/2 + 1) = 66
ny+2 = 65 + 2 = 67 <--- incorrect
Using ny + 2 with odd ny causes c_f_pointer to describe an array one element larger than the actual cuFFT buffer, meaning any access to the last element reads/writes past the end of the allocation.
ny +2 is an algebraic simplification of of 2*(ny/2+ 1) that only holds true for even ny. The R2C FFT definition has output floor(ny/2) + 1 complex modes that requires 2*(floor(ny/2) +1) reals of storage (when you have ny/2+1 complex numbers in memory it's actually stored as [re, im, re, im, re, im]). Therefore in the above code block, the descriptor actually has shape (2*(ny/2 + 1), nx, nz) for R2C output. The leading dimension is 2*(ny/2+1).
We can see NVIDIA's cuFFTMp example on this: https://docs.nvidia.com/hpc-sdk/compilers/fortran-cuda-interfaces/cf-lib-examples.html#using-cufftmp-from-either-openacc-or-cuda-fortran
Here they use local_rshape = [2*(nz/2+1), ny, my_nx]
Fix: replace ny/2 with 2*(ny/2 +1)