-
Notifications
You must be signed in to change notification settings - Fork 45
Description
compile_internal does caching unconditionally - specifically, you can see it calls compile_subroutine with the default of caching=True. It would be nice to expose this caching= argument.
numba-cuda/numba_cuda/numba/cuda/core/base.py
Line 955 in a0a3328
We run into a subtle but insidious issue due to this in CCCL. Here's a reproducer of the issue we see. In the program below, I expect the output to be:
I got 4
I got 8
But, the output is:
I got 4
I got 4
import numpy as np
import warnings
from numba import cuda, types
from numba.core.extending import intrinsic
from numba import types
from numba.core.errors import NumbaPerformanceWarning
# Suppress performance warnings about grid size
warnings.filterwarnings('ignore', category=NumbaPerformanceWarning)
results = []
for num in [4, 8]:
def function():
print("I got", num)
sig = types.void()
@intrinsic
def impl(typingctx):
def codegen(context, builder, impl_sig, args):
result = context.compile_internal(builder, function, sig, [])
return result
return sig, codegen
@cuda.jit
def wrapped_func():
return impl()
@cuda.jit
def test_kernel():
idx = cuda.grid(1)
if idx == 0:
# Call the linked function and store result
wrapped_func()
test_kernel.forall(1)()The issue is that function is captured as a global in compile_internal - and its caching mechanism isn't equipped to handle that appropriately. It only looks at the __code__ of the different function instances, which are all identical. Thus, after the first compilation it always retrieves from the cache.
As a note, a workaround is to use compile_subroutine directly:
cres = context.compile_subroutine(builder, op, sig, caching=False)
result = context.call_internal(builder, cres.fndesc, sig, [])