@@ -11,7 +11,7 @@ of a Python kernel call to a foreign device function are:
1111
1212- The device function implementation in a foreign language (e.g. CUDA C).
1313- A declaration of the device function in Python.
14- - A kernel that links with and calls the foreign function.
14+ - A kernel that calls the foreign function.
1515
1616.. _device-function-abi :
1717
@@ -83,7 +83,7 @@ For example, when:
8383
8484.. code ::
8585
86- mul = cuda.declare_device('mul_f32_f32', 'float32(float32, float32)')
86+ mul = cuda.declare_device('mul_f32_f32', 'float32(float32, float32)' , link="functions.cu" )
8787
8888 is declared, calling ``mul(a, b) `` inside a kernel will translate into a call to
8989``mul_f32_f32(a, b) `` in the compiled code.
@@ -134,15 +134,63 @@ where ``result`` and ``array`` are both arrays of ``float32`` data.
134134Linking and Calling functions
135135-----------------------------
136136
137- The ``link `` keyword argument of the :func: `@cuda.jit <numba.cuda.jit> `
138- decorator accepts a list of file names specified by absolute path or a path
139- relative to the current working directory. Files whose name ends in ``.cu ``
140- will be compiled with the `NVIDIA Runtime Compiler (NVRTC)
141- <https://docs.nvidia.com/cuda/nvrtc/index.html> `_ and linked into the kernel as
142- PTX; other files will be passed directly to the CUDA Linker.
137+ The ``link `` keyword argument to the :func: `declare_device
138+ <numba.cuda.declare_device> ` function accepts *Linkable Code * items. Either a
139+ single Linkable Code item can be passed, or multiple items in a list, tuple, or
140+ set.
141+
142+ A Linkable Code item is either:
143+
144+ * A string indicating the location of a file in the filesystem, or
145+ * A :class: `LinkableCode <numba.cuda.LinkableCode> ` object, for linking code
146+ that exists in memory.
147+
148+ Suported code formats that can be linked are:
149+
150+ * PTX source code (``*.ptx ``)
151+ * CUDA C/C++ source code (``*.cu ``)
152+ * CUDA ELF Fat Binaries (``*.fatbin ``)
153+ * CUDA ELF Cubins (``*.cubin ``)
154+ * CUDA ELF archives (``*.a ``)
155+ * CUDA Object files (``*.o ``)
156+ * CUDA LTOIR files (``*.ltoir ``)
157+
158+ CUDA C/C++ source code will be compiled with the `NVIDIA Runtime Compiler
159+ (NVRTC) <https://docs.nvidia.com/cuda/nvrtc/index.html> `_ and linked into the
160+ kernel as either PTX or LTOIR, depending on whether LTO is enabled. Other files
161+ will be passed directly to the CUDA Linker.
162+
163+ :class: `LinkableCode <numba.cuda.LinkableCode> ` objects are initialized using
164+ the parameters of their base class:
143165
144- For example, the following kernel calls the ``mul() `` function declared above
145- with the implementation ``mul_f32_f32() `` in a file called ``functions.cu ``:
166+ .. autoclass :: numba.cuda.LinkableCode
167+
168+ However, one should instantiate an instance of the class that represents the
169+ type of item being linked:
170+
171+ .. autoclass :: numba.cuda.PTXSource
172+ .. autoclass :: numba.cuda.CUSource
173+ .. autoclass :: numba.cuda.Fatbin
174+ .. autoclass :: numba.cuda.Cubin
175+ .. autoclass :: numba.cuda.Archive
176+ .. autoclass :: numba.cuda.Object
177+ .. autoclass :: numba.cuda.LTOIR
178+
179+ Legacy ``@cuda.jit `` decorator ``link `` support
180+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181+
182+ The ``link `` keyword argument of the :func: `@cuda.jit <numba.cuda.jit> `
183+ decorator also accepts a list of Linkable Code items, which will then be linked
184+ into the kernel. This facility is provided for backwards compatibility; it is
185+ recommended that Linkable Code items are always specified in the
186+ :func: `declare_device <numba.cuda.declare_device> ` call, so that the user of the
187+ declared API is not burdened with specifying the items to link themselves when
188+ writing a kernel.
189+
190+ As an example of how this legacy mechanism looked at the point of use: the
191+ following kernel calls the ``mul() `` function declared above with the
192+ implementation ``mul_f32_f32() `` as if it were in a file called ``functions.cu ``
193+ that had not been declared as part of the ``link `` argument in the declaration:
146194
147195.. code ::
148196
@@ -153,17 +201,13 @@ with the implementation ``mul_f32_f32()`` in a file called ``functions.cu``:
153201 if i < len(r):
154202 r[i] = mul(x[i], y[i])
155203
156-
157204 C/C++ Support
158205-------------
159206
160207Support for compiling and linking of CUDA C/C++ code is provided through the use
161208of NVRTC subject to the following considerations:
162209
163- - It is only available when using the NVIDIA Bindings. See
164- :envvar: `NUMBA_CUDA_USE_NVIDIA_BINDING `.
165- - A suitable version of the NVRTC library for the installed version of the
166- NVIDIA CUDA Bindings must be available.
210+ - A suitable version of the NVRTC library must be available.
167211- The CUDA include path is assumed by default to be ``/usr/local/cuda/include ``
168212 on Linux and ``$env:CUDA_PATH\include `` on Windows. It can be modified using
169213 the environment variable :envvar: `NUMBA_CUDA_INCLUDE_PATH `.
0 commit comments