Skip to content

[BUG] C ABI passes struct values incorrectly when compiling with compile() #175

@gmarkall

Description

@gmarkall

Describe the bug

As observed in NVIDIA/cccl#4248:

Consider an extern "C" device function accepting two structs as inputs:

struct MyStruct {
    char a;
    char b;
    char c;
};

extern "C" __device__ void foo(MyStruct a, MyStruct b) {

}

Here's the corresponding PTX:

.visible .func foo(
        .param .align 1 .b8 foo_param_0[3],
        .param .align 1 .b8 foo_param_1[3]
)
{

        ret;

}

Now, consider defining this device function using numba.cuda, using @gpu_struct to define the type of the arguments, and keeping in mind that gpu_struct uses StructModel to define the underlying numba type:

import numba
from numba import cuda
import numpy as np

from cuda.parallel.experimental.struct import gpu_struct

@gpu_struct
class MyStruct:
    a: np.int8
    b: np.int8
    c: np.int8

def op(a, b):
    pass

tp = numba.typeof(MyStruct(1, 2, 3))
ptx, _ = cuda.compile(op, (tp, tp))

print(ptx)

Here's the output of the above script:

.visible .func  (.param .b64 func_retval0) op(
        .param .b32 op_param_0,
        .param .b32 op_param_1,
        .param .b32 op_param_2,
        .param .b32 op_param_3,
        .param .b32 op_param_4,
        .param .b32 op_param_5
)
{
        .reg .b64       %rd<2>;


        mov.u64         %rd1, 0;
        st.param.b64    [func_retval0+0], %rd1;
        ret;

}

Comparing the two PTXs, we see an ABI difference: on the C++ side, struct arguments are of type .b8[N] where N is the size of the struct. On the numba side however, we note that structs have been decomposed into their members, and the function accepts a .b32 for each member. (reason for promotion to 32-bit).

This is because the C ABI calling convention is packing struct value arguments according to the Numba calling convention, not the C one. It just uses the default argument packer:

arginfo = self._get_arg_packer(argtys)

Steps/Code to reproduce bug

Reproducer in code above.

Expected behavior

The C ABI calling convention should pack struct values according to the C ABI calling convention, yielding the same function prototypes as CUDA C/C++ code compiled with NVCC.

This will probably require the implementation of a new arg packer for the C calling convention.

Environment details (please complete the following information):

All environments.

Additional context

No other context.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions