Skip to content

Commit 5389798

Browse files
Fix freezing in of constant arrays with negative strides (#589)
This PR fixes a bug when freezing certain views of numpy arrays into kernels. The current implementation loops through the bytes of the source array in viewed order but maintains a potentially negative stride without also in that case bumping the pointer to the end of the array. This PR changes it so that the physical array we put in constant memory is just the contiguous logical equivalent of the original array with a positive stride. This should be simpler than actually jumping through the pointer arithmetic given we have to make a copy of the view anyways, so the inherent "viewness" is already going to be lost when moving the data to device.
1 parent a0a3328 commit 5389798

File tree

2 files changed

+34
-0
lines changed

2 files changed

+34
-0
lines changed

numba_cuda/numba/cuda/target.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from llvmlite import ir
88
import warnings
99
import importlib.util
10+
import numpy as np
1011

1112
from numba.cuda import types
1213
from numba.cuda import HAS_NUMBA
@@ -280,6 +281,14 @@ def make_constant_array(self, builder, aryty, arr):
280281
addrspace.
281282
"""
282283

284+
# Ensure we have a contiguous buffer with non-negative strides. views with
285+
# negative strides must be materialized so that the
286+
# constant bytes and the data pointer/strides are consistent.
287+
if any(s < 0 for s in arr.strides) or not (
288+
arr.flags.c_contiguous or arr.flags.f_contiguous
289+
):
290+
arr = np.ascontiguousarray(arr)
291+
283292
lmod = builder.module
284293

285294
constvals = [

numba_cuda/numba/cuda/tests/cudapy/test_array_methods.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from numba import cuda
66
from numba.cuda.testing import CUDATestCase
77
import unittest
8+
from numba.cuda import config
89

910

1011
def reinterpret_array_type(byte_arr, start, stop, output):
@@ -14,6 +15,15 @@ def reinterpret_array_type(byte_arr, start, stop, output):
1415

1516

1617
class TestCudaArrayMethods(CUDATestCase):
18+
def setUp(self):
19+
self.old_nrt_setting = config.CUDA_ENABLE_NRT
20+
config.CUDA_ENABLE_NRT = True
21+
super(TestCudaArrayMethods, self).setUp()
22+
23+
def tearDown(self):
24+
config.CUDA_ENABLE_NRT = self.old_nrt_setting
25+
super(TestCudaArrayMethods, self).tearDown()
26+
1727
def test_reinterpret_array_type(self):
1828
"""
1929
Reinterpret byte array as int32 in the GPU.
@@ -33,6 +43,21 @@ def test_reinterpret_array_type(self):
3343
got = output[0]
3444
self.assertEqual(expect, got)
3545

46+
def test_array_copy(self):
47+
val = np.array([1, 2, 3])[::-1]
48+
49+
@cuda.jit
50+
def kernel(out):
51+
q = val.copy()
52+
for i in range(len(out)):
53+
out[i] = q[i]
54+
55+
out = cuda.to_device(np.zeros(len(val), dtype="float64"))
56+
57+
kernel[1, 1](out)
58+
for i, j in zip(out.copy_to_host(), val):
59+
self.assertEqual(i, j)
60+
3661

3762
if __name__ == "__main__":
3863
unittest.main()

0 commit comments

Comments
 (0)