-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Labels
feature requestNew feature or requestNew feature or request
Milestone
Description
Presently PTX can be inlined, but it requires writing NVVM IR with llvmlite inside a Numba extension (either with typing and a lowering function that generates an ir.InlineAsm instruction, or an overload and an intrinsic with the necessary code generation).
It would be nicer to be able to write inline PTX more simply. There are two possible ways this could be done:
- Through Pythonic intrinsics for PTX instructions, if these were to exist.
- Through a simpler API that can be used directly in a kernel.
For the latter option, the usage could look like:
@cuda.jit
def f(r, x):
arg = x[0]
result = inline_ptx("tanh.approx.f32 $0, $1;", "=f,f", (arg,))
r[0] = resultThis mimics the CUDA C++ API for inline PTX, where the assembly snippet, constraints, and arguments all need to be provided.
leofang, kaeun97, kkraus14 and benhg
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request