You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR adds device side implementations of some of the NRT c-api. This is a first step towards support for allocations and refcounting / garbage collection on the device, and serves as a foundation for building on, rather than feature completeness for any particular piece of functionality.
Combined with the change to the `CUDATargetContext` object, this allows launching of kernels like this:
```python
from numba import cuda
import numpy as np
@cuda.jit
def f(x):
return x[:5]
@cuda.jit('void()', link=['nrt.cu'])
def g():
x = cuda.shared.array(10, dtype=np.int32)
f(x)
g[1,1]()
```
Notes on the implementation:
- Currently, no memsys is used and it remains to be discussed how we'd like to expose it, knowing that it may have to be used outside of numba to free objects that persist after the kernel finishes executing.
- Basic tests are added. We have `test_nrt.py` that mainly enables NRT and test that a refcounted variable can successfully pass in and return from a second function. We also have another test that mocks up a device side allocation and test that allocation statistics can be correctly collected, which is xfailed until stats are functional.
- NRT functions are linked when any of the NRT specific functions are found in the jitted PTX.
---------
Co-authored-by: Graham Markall <[email protected]>
Co-authored-by: Michael Yh Wang <[email protected]>
0 commit comments