|
1 | 1 | .. meta:: |
2 | | - :description: rocFFT documentation and API reference library |
3 | | - :keywords: rocFFT, ROCm, API, documentation |
| 2 | + :description: How to load and store callbacks in rocFFT |
| 3 | + :keywords: rocFFT, ROCm, API, documentation, callbacks |
4 | 4 |
|
5 | 5 | .. _load-store-callbacks: |
6 | 6 |
|
7 | 7 | ******************************************************************** |
8 | | -Load and Store Callbacks |
| 8 | +Load and store callbacks |
9 | 9 | ******************************************************************** |
10 | 10 |
|
11 | 11 | rocFFT includes experimental functionality to call user-defined device functions |
12 | | -when loading input from global memory at the start of a transform, or |
13 | | -when storing output to global memory at the end of a transform. |
| 12 | +when loading input from global memory at the transform start or |
| 13 | +when storing output to global memory at the transform end. |
14 | 14 |
|
15 | | -These user-defined callback functions may be optionally supplied |
| 15 | +These optional user-defined callback functions can be supplied |
16 | 16 | to the library using |
17 | 17 | :cpp:func:`rocfft_execution_info_set_load_callback` and |
18 | 18 | :cpp:func:`rocfft_execution_info_set_store_callback`. |
19 | 19 |
|
20 | 20 | Device functions supplied as callbacks must load and store element |
21 | | -data types that are appropriate for the transform being performed. |
22 | | - |
23 | | -+-------------------------+--------------------+----------------------+ |
24 | | -|Transform type | Load element type | Store element type | |
25 | | -+=========================+====================+======================+ |
26 | | -|Complex-to-complex, | `_Float16_2` | `_Float16_2` | |
27 | | -|half-precision | | | |
28 | | -+-------------------------+--------------------+----------------------+ |
29 | | -|Complex-to-complex, | `float2` | `float2` | |
30 | | -|single-precision | | | |
31 | | -+-------------------------+--------------------+----------------------+ |
32 | | -|Complex-to-complex, | `double2` | `double2` | |
33 | | -|double-precision | | | |
34 | | -+-------------------------+--------------------+----------------------+ |
35 | | -|Real-to-complex, | `float` | `float2` | |
36 | | -|single-precision | | | |
37 | | -+-------------------------+--------------------+----------------------+ |
38 | | -|Real-to-complex, | `_Float16` | `_Float16_2` | |
39 | | -|half-precision | | | |
40 | | -+-------------------------+--------------------+----------------------+ |
41 | | -|Real-to-complex, | `double` | `double2` | |
42 | | -|double-precision | | | |
43 | | -+-------------------------+--------------------+----------------------+ |
44 | | -|Complex-to-real, | `_Float16_2` | `_Float16` | |
45 | | -|half-precision | | | |
46 | | -+-------------------------+--------------------+----------------------+ |
47 | | -|Complex-to-real, | `float2` | `float` | |
48 | | -|single-precision | | | |
49 | | -+-------------------------+--------------------+----------------------+ |
50 | | -|Complex-to-real, | `double2` | `double` | |
51 | | -|double-precision | | | |
52 | | -+-------------------------+--------------------+----------------------+ |
| 21 | +data types appropriate for the transform being executed. |
| 22 | + |
| 23 | ++-------------------------+----------------------+------------------------+ |
| 24 | +|Transform type | Load element type | Store element type | |
| 25 | ++=========================+======================+========================+ |
| 26 | +|Complex-to-complex, | ``_Float16_2`` | ``_Float16_2`` | |
| 27 | +|half-precision | | | |
| 28 | ++-------------------------+----------------------+------------------------+ |
| 29 | +|Complex-to-complex, | ``float2`` | ``float2`` | |
| 30 | +|single-precision | | | |
| 31 | ++-------------------------+----------------------+------------------------+ |
| 32 | +|Complex-to-complex, | ``double2`` | ``double2`` | |
| 33 | +|double-precision | | | |
| 34 | ++-------------------------+----------------------+------------------------+ |
| 35 | +|Real-to-complex, | ``float`` | ``float2`` | |
| 36 | +|single-precision | | | |
| 37 | ++-------------------------+----------------------+------------------------+ |
| 38 | +|Real-to-complex, | ``_Float16`` | ``_Float16_2`` | |
| 39 | +|half-precision | | | |
| 40 | ++-------------------------+----------------------+------------------------+ |
| 41 | +|Real-to-complex, | ``double`` | ``double2`` | |
| 42 | +|double-precision | | | |
| 43 | ++-------------------------+----------------------+------------------------+ |
| 44 | +|Complex-to-real, | ``_Float16_2`` | ``_Float16`` | |
| 45 | +|half-precision | | | |
| 46 | ++-------------------------+----------------------+------------------------+ |
| 47 | +|Complex-to-real, | ``float2`` | ``float`` | |
| 48 | +|single-precision | | | |
| 49 | ++-------------------------+----------------------+------------------------+ |
| 50 | +|Complex-to-real, | ``double2`` | ``double`` | |
| 51 | +|double-precision | | | |
| 52 | ++-------------------------+----------------------+------------------------+ |
53 | 53 |
|
54 | 54 | The callback function signatures must match the specifications |
55 | 55 | below. |
56 | 56 |
|
57 | 57 | .. code-block:: c |
58 | 58 |
|
59 | | - T load_callback(T* buffer, size_t offset, void* callback_data, void* shared_memory); |
60 | | - void store_callback(T* buffer, size_t offset, T element, void* callback_data, void* shared_memory); |
| 59 | + Tdata load_callback(Tdata* buffer, size_t offset, void* callback_data, void* shared_memory); |
| 60 | + void store_callback(Tdata* buffer, size_t offset, Tdata element, void* callback_data, void* shared_memory); |
61 | 61 |
|
62 | | -The parameters for the functions are defined as: |
| 62 | +The parameters for the functions are as follows: |
63 | 63 |
|
64 | | -* `T`: The data type of each element being loaded or stored from the |
| 64 | +* ``Tdata``: The data type of each element being loaded or stored from the |
65 | 65 | input or output. |
66 | | -* `buffer`: Pointer to the input (for load callbacks) or |
| 66 | +* ``buffer``: Pointer to the input (for load callbacks) or |
67 | 67 | output (for store callbacks) in device memory that was passed to |
68 | 68 | :cpp:func:`rocfft_execute`. |
69 | | -* `offset`: The offset of the location being read from or written |
70 | | - to. This counts in elements, from the `buffer` pointer. |
71 | | -* `element`: For store callbacks only, the element to be stored. |
72 | | -* `callback_data`: A pointer value accepted by |
| 69 | +* ``offset``: The offset of the location being read from or written |
| 70 | + to. This counts by elements from the ``buffer`` pointer. |
| 71 | +* ``element``: For store callbacks only, the element to be stored. |
| 72 | +* ``callback_data``: A pointer value accepted by |
73 | 73 | :cpp:func:`rocfft_execution_info_set_load_callback` and |
74 | 74 | :cpp:func:`rocfft_execution_info_set_store_callback` which is passed |
75 | 75 | through to the callback function. |
76 | | -* `shared_memory`: A pointer to an amount of shared memory requested |
77 | | - when the callback is set. Shared memory is not supported, |
78 | | - and this parameter is always null. |
| 76 | +* ``shared_memory``: A pointer to an amount of shared memory requested |
| 77 | + when the callback is set. Shared memory is not supported, |
| 78 | + so this parameter is always null. |
79 | 79 |
|
80 | 80 | Callback functions are called exactly once for each element being |
81 | | -loaded or stored in a transform. Note that multiple kernels may be |
| 81 | +loaded or stored in a transform. Multiple kernels can be |
82 | 82 | launched to decompose a transform, which means that separate kernels |
83 | | -may call the load and store callbacks for a transform if both are |
| 83 | +might call the load and store callbacks for a transform if both are |
84 | 84 | specified. |
85 | 85 |
|
86 | 86 | Callbacks functions are only supported for transforms that do not use planar format for input or output. |
87 | | - |
88 | | -Runtime compilation |
89 | | -=================== |
90 | | - |
91 | | -rocFFT includes many kernels for common FFT problems. Some plans may |
92 | | -require additional kernels aside from what is built in to the |
93 | | -library. In these cases, rocFFT will compile optimized kernels for |
94 | | -the plan when the plan is created. |
95 | | - |
96 | | -Compiled kernels are stored in memory by default and will be reused |
97 | | -if they are required again for plans in the same process. |
98 | | - |
99 | | -If the ``ROCFFT_RTC_CACHE_PATH`` environment variable is set to a |
100 | | -writable file location, rocFFT will write compiled kernels to this |
101 | | -location. rocFFT will read kernels from this location for plans in |
102 | | -other processes that need runtime-compiled kernels. rocFFT will |
103 | | -create the specified file if it does not already exist. |
0 commit comments