Skip to content

Commit d6399f2

Browse files
committed
Update documentation for link kwarg in declare_device()
1 parent 83abcee commit d6399f2

File tree

4 files changed

+83
-31
lines changed

4 files changed

+83
-31
lines changed

docs/source/user/cuda_ffi.rst

Lines changed: 59 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ of a Python kernel call to a foreign device function are:
1111

1212
- The device function implementation in a foreign language (e.g. CUDA C).
1313
- A declaration of the device function in Python.
14-
- A kernel that links with and calls the foreign function.
14+
- A kernel that calls the foreign function.
1515

1616
.. _device-function-abi:
1717

@@ -83,7 +83,7 @@ For example, when:
8383

8484
.. code::
8585
86-
mul = cuda.declare_device('mul_f32_f32', 'float32(float32, float32)')
86+
mul = cuda.declare_device('mul_f32_f32', 'float32(float32, float32)' , link="functions.cu")
8787
8888
is declared, calling ``mul(a, b)`` inside a kernel will translate into a call to
8989
``mul_f32_f32(a, b)`` in the compiled code.
@@ -134,15 +134,63 @@ where ``result`` and ``array`` are both arrays of ``float32`` data.
134134
Linking and Calling functions
135135
-----------------------------
136136

137-
The ``link`` keyword argument of the :func:`@cuda.jit <numba.cuda.jit>`
138-
decorator accepts a list of file names specified by absolute path or a path
139-
relative to the current working directory. Files whose name ends in ``.cu``
140-
will be compiled with the `NVIDIA Runtime Compiler (NVRTC)
141-
<https://docs.nvidia.com/cuda/nvrtc/index.html>`_ and linked into the kernel as
142-
PTX; other files will be passed directly to the CUDA Linker.
137+
The ``link`` keyword argument to the :func:`declare_device
138+
<numba.cuda.declare_device>` function accepts *Linkable Code* items. Either a
139+
single Linkable Code item can be passed, or multiple items in a list, tuple, or
140+
set.
141+
142+
A Linkable Code item is either:
143+
144+
* A string indicating the location of a file in the filesystem, or
145+
* A :class:`LinkableCode <numba.cuda.LinkableCode>` object, for linking code
146+
that exists in memory.
147+
148+
Suported code formats that can be linked are:
149+
150+
* PTX source code (``*.ptx``)
151+
* CUDA C/C++ source code (``*.cu``)
152+
* CUDA ELF Fat Binaries (``*.fatbin``)
153+
* CUDA ELF Cubins (``*.cubin``)
154+
* CUDA ELF archives (``*.a``)
155+
* CUDA Object files (``*.o``)
156+
* CUDA LTOIR files (``*.ltoir``)
157+
158+
CUDA C/C++ source code will be compiled with the `NVIDIA Runtime Compiler
159+
(NVRTC) <https://docs.nvidia.com/cuda/nvrtc/index.html>`_ and linked into the
160+
kernel as either PTX or LTOIR, depending on whether LTO is enabled. Other files
161+
will be passed directly to the CUDA Linker.
162+
163+
:class:`LinkableCode <numba.cuda.LinkableCode>` objects are initialized using
164+
the parameters of their base class:
143165

144-
For example, the following kernel calls the ``mul()`` function declared above
145-
with the implementation ``mul_f32_f32()`` in a file called ``functions.cu``:
166+
.. autoclass:: numba.cuda.LinkableCode
167+
168+
However, one should instantiate an instance of the class that represents the
169+
type of item being linked:
170+
171+
.. autoclass:: numba.cuda.PTXSource
172+
.. autoclass:: numba.cuda.CUSource
173+
.. autoclass:: numba.cuda.Fatbin
174+
.. autoclass:: numba.cuda.Cubin
175+
.. autoclass:: numba.cuda.Archive
176+
.. autoclass:: numba.cuda.Object
177+
.. autoclass:: numba.cuda.LTOIR
178+
179+
Legacy ``@cuda.jit`` decorator ``link`` support
180+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181+
182+
The ``link`` keyword argument of the :func:`@cuda.jit <numba.cuda.jit>`
183+
decorator also accepts a list of Linkable Code items, which will then be linked
184+
into the kernel. This facility is provided for backwards compatibility; it is
185+
recommended that Linkable Code items are always specified in the
186+
:func:`declare_device <numba.cuda.declare_device>` call, so that the user of the
187+
declared API is not burdened with specifying the items to link themselves when
188+
writing a kernel.
189+
190+
As an example of how this legacy mechanism looked at the point of use: the
191+
following kernel calls the ``mul()`` function declared above with the
192+
implementation ``mul_f32_f32()`` as if it were in a file called ``functions.cu``
193+
that had not been declared as part of the ``link`` argument in the declaration:
146194

147195
.. code::
148196
@@ -153,17 +201,13 @@ with the implementation ``mul_f32_f32()`` in a file called ``functions.cu``:
153201
if i < len(r):
154202
r[i] = mul(x[i], y[i])
155203
156-
157204
C/C++ Support
158205
-------------
159206

160207
Support for compiling and linking of CUDA C/C++ code is provided through the use
161208
of NVRTC subject to the following considerations:
162209

163-
- It is only available when using the NVIDIA Bindings. See
164-
:envvar:`NUMBA_CUDA_USE_NVIDIA_BINDING`.
165-
- A suitable version of the NVRTC library for the installed version of the
166-
NVIDIA CUDA Bindings must be available.
210+
- A suitable version of the NVRTC library must be available.
167211
- The CUDA include path is assumed by default to be ``/usr/local/cuda/include``
168212
on Linux and ``$env:CUDA_PATH\include`` on Windows. It can be modified using
169213
the environment variable :envvar:`NUMBA_CUDA_INCLUDE_PATH`.

numba_cuda/numba/cuda/cudadrv/linkable_code.py

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,12 @@
22

33

44
class LinkableCode:
5-
"""An object that can be passed in the `link` list argument to `@cuda.jit`
6-
kernels to supply code to be linked from memory."""
5+
"""An object that holds code to be linked from memory.
6+
7+
:param data: A buffer containing the data to link.
8+
:param name: The name of the file to be referenced in any compilation or
9+
linking errors that may be produced.
10+
"""
711

812
def __init__(self, data, name=None):
913
self.data = data
@@ -15,49 +19,49 @@ def name(self):
1519

1620

1721
class PTXSource(LinkableCode):
18-
"""PTX Source code in memory"""
22+
"""PTX source code in memory."""
1923

2024
kind = FILE_EXTENSION_MAP["ptx"]
2125
default_name = "<unnamed-ptx>"
2226

2327

2428
class CUSource(LinkableCode):
25-
"""CUDA C/C++ Source code in memory"""
29+
"""CUDA C/C++ source code in memory."""
2630

2731
kind = "cu"
2832
default_name = "<unnamed-cu>"
2933

3034

3135
class Fatbin(LinkableCode):
32-
"""A fatbin ELF in memory"""
36+
"""An ELF Fatbin in memory."""
3337

3438
kind = FILE_EXTENSION_MAP["fatbin"]
3539
default_name = "<unnamed-fatbin>"
3640

3741

3842
class Cubin(LinkableCode):
39-
"""A cubin ELF in memory"""
43+
"""An ELF Cubin in memory."""
4044

4145
kind = FILE_EXTENSION_MAP["cubin"]
4246
default_name = "<unnamed-cubin>"
4347

4448

4549
class Archive(LinkableCode):
46-
"""An archive of objects in memory"""
50+
"""An archive of objects in memory."""
4751

4852
kind = FILE_EXTENSION_MAP["a"]
4953
default_name = "<unnamed-archive>"
5054

5155

5256
class Object(LinkableCode):
53-
"""An object file in memory"""
57+
"""An object file in memory."""
5458

5559
kind = FILE_EXTENSION_MAP["o"]
5660
default_name = "<unnamed-object>"
5761

5862

5963
class LTOIR(LinkableCode):
60-
"""An LTOIR file in memory"""
64+
"""An LTOIR file in memory."""
6165

6266
kind = "ltoir"
6367
default_name = "<unnamed-ltoir>"

numba_cuda/numba/cuda/decorators.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,7 @@ def declare_device(name, sig, link=None):
181181
:param name: The name of the foreign function.
182182
:type name: str
183183
:param sig: The Numba signature of the function.
184+
:param link: External code to link when calling the function.
184185
"""
185186
if link is None:
186187
link = tuple()

numba_cuda/numba/cuda/tests/doc_examples/test_ffi.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,18 @@ def test_ex_linking_cu(self):
1515
import numpy as np
1616
import os
1717

18-
# Declaration of the foreign function
19-
mul = cuda.declare_device('mul_f32_f32', 'float32(float32, float32)')
20-
2118
# Path to the source containing the foreign function
2219
# (here assumed to be in a subdirectory called "ffi")
2320
basedir = os.path.dirname(os.path.abspath(__file__))
2421
functions_cu = os.path.join(basedir, 'ffi', 'functions.cu')
2522

26-
# Kernel that links in functions.cu and calls mul
27-
@cuda.jit(link=[functions_cu])
23+
# Declaration of the foreign function
24+
mul = cuda.declare_device('mul_f32_f32', 'float32(float32, float32)',
25+
link=functions_cu)
26+
27+
# A kernel that calls mul; functions.cu is linked automatically due to
28+
# the call to mul.
29+
@cuda.jit
2830
def multiply_vectors(r, x, y):
2931
i = cuda.grid(1)
3032

@@ -54,14 +56,15 @@ def test_ex_from_buffer(self):
5456

5557
# magictoken.ex_from_buffer_decl.begin
5658
signature = 'float32(CPointer(float32), int32)'
57-
sum_reduce = cuda.declare_device('sum_reduce', signature)
59+
sum_reduce = cuda.declare_device('sum_reduce', signature,
60+
link=functions_cu)
5861
# magictoken.ex_from_buffer_decl.end
5962

6063
# magictoken.ex_from_buffer_kernel.begin
6164
import cffi
6265
ffi = cffi.FFI()
6366

64-
@cuda.jit(link=[functions_cu])
67+
@cuda.jit
6568
def reduction_caller(result, array):
6669
array_ptr = ffi.from_buffer(array)
6770
result[()] = sum_reduce(array_ptr, len(array))

0 commit comments

Comments
 (0)