Skip to content

[FEA] Support for simpler ways to inline PTX #532

@gmarkall

Description

@gmarkall

Presently PTX can be inlined, but it requires writing NVVM IR with llvmlite inside a Numba extension (either with typing and a lowering function that generates an ir.InlineAsm instruction, or an overload and an intrinsic with the necessary code generation).

It would be nicer to be able to write inline PTX more simply. There are two possible ways this could be done:

  • Through Pythonic intrinsics for PTX instructions, if these were to exist.
  • Through a simpler API that can be used directly in a kernel.

For the latter option, the usage could look like:

@cuda.jit
def f(r, x):
    arg = x[0]
    result = inline_ptx("tanh.approx.f32 $0, $1;", "=f,f", (arg,))
    r[0] = result

This mimics the CUDA C++ API for inline PTX, where the assembly snippet, constraints, and arguments all need to be provided.

cc @leofang @oleksandr-pavlyk @benhg

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions