-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Describe the bug
As observed in NVIDIA/cccl#4248:
Consider an extern "C" device function accepting two structs as inputs:
struct MyStruct {
char a;
char b;
char c;
};
extern "C" __device__ void foo(MyStruct a, MyStruct b) {
}Here's the corresponding PTX:
.visible .func foo(
.param .align 1 .b8 foo_param_0[3],
.param .align 1 .b8 foo_param_1[3]
)
{
ret;
}Now, consider defining this device function using numba.cuda, using @gpu_struct to define the type of the arguments, and keeping in mind that gpu_struct uses StructModel to define the underlying numba type:
import numba
from numba import cuda
import numpy as np
from cuda.parallel.experimental.struct import gpu_struct
@gpu_struct
class MyStruct:
a: np.int8
b: np.int8
c: np.int8
def op(a, b):
pass
tp = numba.typeof(MyStruct(1, 2, 3))
ptx, _ = cuda.compile(op, (tp, tp))
print(ptx)Here's the output of the above script:
.visible .func (.param .b64 func_retval0) op(
.param .b32 op_param_0,
.param .b32 op_param_1,
.param .b32 op_param_2,
.param .b32 op_param_3,
.param .b32 op_param_4,
.param .b32 op_param_5
)
{
.reg .b64 %rd<2>;
mov.u64 %rd1, 0;
st.param.b64 [func_retval0+0], %rd1;
ret;
}Comparing the two PTXs, we see an ABI difference: on the C++ side, struct arguments are of type .b8[N] where N is the size of the struct. On the numba side however, we note that structs have been decomposed into their members, and the function accepts a .b32 for each member. (reason for promotion to 32-bit).
This is because the C ABI calling convention is packing struct value arguments according to the Numba calling convention, not the C one. It just uses the default argument packer:
numba-cuda/numba_cuda/numba/cuda/target.py
Line 306 in fbbc040
| arginfo = self._get_arg_packer(argtys) |
Steps/Code to reproduce bug
Reproducer in code above.
Expected behavior
The C ABI calling convention should pack struct values according to the C ABI calling convention, yielding the same function prototypes as CUDA C/C++ code compiled with NVCC.
This will probably require the implementation of a new arg packer for the C calling convention.
Environment details (please complete the following information):
All environments.
Additional context
No other context.