Skip to content

[FEA] Support for __dlpack_capi__ #531

@gmarkall

Description

@gmarkall

As part of a general request for all array / data containers in CUDA and Python to support the __dlpack_capi__ API, there is a request for Numba-CUDA to implement this too.

The most efficient / fastest implementation may be hard to do with Numba-CUDA device arrays in their present form - the shape, strides, data pointer, etc. are stored on a Python subclass of the base device array class. The base device array is a plain PyObject struct, as outlined:

/* DeviceArray PyObject implementation. Note that adding more members here is
* presently prohibited because mapped and managed arrays derive from both
* DeviceArray and NumPy's ndarray, which is also a C extension class - the
* layout of the object cannot be resolved if this class also has members beyond
* PyObject_HEAD. */
class DeviceArray {
PyObject_HEAD
};

Therefore, a __dlpack_capi__ implementation would need to go back to CPython to get the necessary data, rather than it being stored in the DeviceArray struct. There may be some refactoring possible to allow us to extend this structure to retain the necessary information at the C layer, but some exploration is needed to discover the possible / necessary changes.

There is likely still some performance gain from implementing this API, by bypassing the construction of new Python objects compared to implementing e.g. __dlpack__.

I am unsure whether the request is also that kernel launches should operate of objects exposing __dlpack_capi__, but it seems that this may provide a path for faster launches.

cc @oleksandr-pavlyk

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions