-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Describe the bug
NVVM IR generation for warp sync intrinsics was fixed by #231 (for Issue #228). This only fixes the issue for some intrinsics; the warp vote intrinsics, documented at https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html?highlight=data%2520movement#vote also require the mode parameter to be a constant int.
These are used in the implementation of the following functions in Numba-CUDA:
cuda.all_sync()cuda.any_sync()cuda.eq_sync()cuda.ballot_sync()
Steps/Code to reproduce bug
This issue can be observed for all_sync() (others are similar) by running:
from numba import cuda, types
def use_shfl_sync_idx(ary):
i = cuda.grid(1)
val = cuda.all_sync(0xFFFFFFFF, ary[i])
ary[i] = val
args = (types.int32[::1],)
ptx, resty = cuda.compile_ptx(use_shfl_sync_idx, args, cc=(10, 0))
print(ptx)with
NUMBA_DUMP_LLVM=1 python repro.py
The output will contain something similar to:
%".5" = trunc i64 4294967295 to i32
%".6" = trunc i64 0 to i32
%".7" = icmp ne i32 %"predicate", 0
%".8" = call {i32, i1} @"llvm.nvvm.vote.sync"(i32 %".5", i32 %".6", i1 %".7")
to set up the parameters for and call llvm.nvvm.vote.sync(). The parameter %.6 is the mode parameter. Instead of being a value, it should be an constant i32 0. So the generated IR should look something like:
%".5" = trunc i64 4294967295 to i32
%".6" = icmp ne i32 %"predicate", 0
%".7" = call {i32, i1} @"llvm.nvvm.vote.sync"(i32 %".5", i32 0, i1 %".6")
(note that values are renumbered now that there is no need for a trunc i64 to 0 value).
The output PTX may be printed by the example; however, prior to the fix there is also a chance of crashing NVVM.
Expected behavior
Constant parameters should be generated for the mode parameter, as seen in the corrected output in the example above.
Additional context
The fix will look very similar for that to #288.
I think that there will be no need for the intrinsic_wrapper.py file after this fix, and it can be removed.
NVBug: 5122224.