luisa-python-lang (WIP)

A new Python DSL frontend for LuisaCompute. Will be integrated into LuisaCompute python package once it's ready.

Introduction

import luisa_lang as lc

@lc.struct
class AABB:
    min: lc.float3
    max: lc.float3

    @lc.trace
    def __init__(self, min: lc.float3, max: lc.float3):
        self.min = min
        self.max = max

    @lc.trace
    def size(self) -> lc.float3:
        return self.max - self.min


# create a struct instance on the host
aabb = AABB(lc.float3(0.0, 0.0, 0.0), lc.float3(1.0, 1.0, 1.0))
# you can call the method on the host
size = aabb.size()

device = = lc.Device('cuda')  # or 'cpu', 'metal', etc.
buf_aabb = device.create_buffer(AABB, 1)  # create a buffer to hold the struct on the device
buf_aabb[0] = aabb  # copy the struct to the device buffer
buf_size = device.create_buffer(lc.float3, 1)  # create a buffer to hold the size on the device

@lc.kernel
def compute_aabb_size(aabb_buf: lc.Buffer[AABB], size_buf: lc.Buffer[lc.float3]):
    i = lc.dispatch_id().x
    aabb = aabb_buf[i]
    size_buf[i] = aabb.size()  # call the method on the device

stream = device.create_stream()  # create a stream to execute the kernel
stream.submit([
    compute_aabb_size(buf_aabb, buf_size).dispatch(1)
]).synchronize()  # submit the kernel to the stream and wait for it to finish

print(f"AABB size: {buf_size[0]}")  # print the size of the AABB on the host

Basic Syntax

Types

Scalar types:

Scalar types are immutable and passed by value. They can be used in arithmetic operations, comparisons, and other expressions. LuisaCompute supports the following scalar types:

lc.i8, lc.int8, lc.byte, lc.ubyte`: 8-bit signed and unsigned integers.
lc.i16, lc.int16, lc.short, lc.ushort: 16-bit signed and unsigned integers.
lc.i32, lc.int32, lc.int, lc.uint: 32-bit signed and unsigned integers.
lc.i64, lc.int64, lc.long, lc.ulong: 64-bit signed and unsigned integers.
lc.bool: Boolean type.
lc.f32, lc.float, lc.f64, lc.double: Floating point types.

Vector types

Compound types such as vectors and matrices are mutable and passed by reference. LuisaCompute supports the following vector types:

lc.byte2, lc.byte3, lc.byte4: 2, 3, and 4-component byte vectors.
lc.ubyte2, lc.ubyte3, lc.ubyte4: 2, 3, and 4-component unsigned byte vectors.
lc.short2, lc.short3, lc.short4: 2, 3, and 4-component signed short vectors.
lc.ushort2, lc.ushort3, lc.ushort4: 2, 3, and 4-component unsigned short vectors.
lc.int2, lc.int3, lc.int4: 2, 3, and 4-component signed integer vectors.
lc.uint2, lc.uint3, lc.uint4: 2, 3, and 4-component unsigned integer vectors.
lc.long2, lc.long3, lc.long4: 2, 3, and 4-component signed long vectors.
lc.ulong2, lc.ulong3, lc.ulong4: 2, 3, and 4-component unsigned long vectors.
lc.float2, lc.float3, lc.float4: 2, 3, and 4-component floating point vectors.
lc.double2, lc.double3, lc.double4: 2, 3, and 4-component double precision floating point vectors.

Matrix types

lc.float2x2, lc.float3x3, lc.float4x4: 2x2, 3x3, and 4x4 floating point matrices.
lc.double2x2, lc.double3x3, lc.double4x4: 2x2, 3x3, and 4x4 double precision floating point matrices.

User-defined structs

lc.struct: Used to define user-defined types (similar to C structs). Fields can be of any type, including other structs, vectors, and matrices. Structs are passed by reference, and can be instantiated on both the host and device. However, assigning to struct fields copies the value into the struct, in contrast to ordinary Python objects where assignment is by reference.

Example:

@lc.struct
class MyStruct:
    a: lc.int
    b: lc.float3
    c: lc.double4x4

    @lc.trace
    def __init__(self, a: lc.int, b: lc.float3, c: lc.double4x4):
        self.a = a
        self.b = b
        self.c = c
    
    def get_a(self) -> lc.int:
        return self.a

# struct can be instantiated on the host or device
s = MyStruct(42, lc.float3(1.0, 2.0, 3.0), lc.double4x4.identity())
a = s.get_a()  # Call method on the host as well
v = float3(1.0, 2.0, 3.0) + s.b  # Vector addition
# assigning to struct field COPIES the value into the struct
s.b = v

Value and Reference Semantics

Parameter Passing

To ensure the host and device code behave consistently, LuisaCompute uses a mix of value and reference semantics for different types that largely resembles Python's behavior but with some differences: In LuisaCompute, scalar types are passed by value, meaning that when you pass a scalar to a function or assign it to another variable, a copy is made. In contrast, compound types (vectors, matrices, and structs) are passed by reference, meaning that when you pass them to a function or assign them to another variable, the reference to the original object is used, not a copy. However, when you assign a value to a field of a struct, the value is copied into the struct.

Let's take a look at some examples to illustrate this behavior:

@lc.struct
class MyStruct:
    a: lc.int
    b: lc.float3

    @lc.trace
    def __init__(self, a: lc.int, b: lc.float3):
        self.a = a
        self.b = b


@lc.func # the semantics is the same for `@lc.func, @lc.trace`.
def inc_a(s: MyStruct, x: lc.int) -> lc.int:
    # This will modify the original struct's 'a' field
    s.a += 1
    x += 1
    return s.a


@lc.kernel
def kernel_example():
    s = MyStruct(10, lc.float3(1.0, 2.0, 3.0))
    i = lc.int(5)
    new_a = inc_a(s, i)  # This will modify 's.a' in the original struct
    lc.print(new_a)  # Should print 11
    lc.print(s.a)  # Should also print 11, as 's' is modified by reference
    lc.print(i) # should print 5 since scalars are passed by value and immutable

Dynamic and Static Control Flow

Assignment Behavior

The semantics of assignment operator = is the same as in Python: when assinging to immutable types (scalars), the value is copied into the variable, while for mutable types (vectors, matrices, structs), the reference is copied. The only difference is that when assigning to a field of a struct, the value is copied into the struct. However, not all references can be implemented on GPU. LuisaCompute would detect such case and ask to to rewrite such assignment by explicitly copying the value using lc.copy() function.

The behavior can be summarize in the following table:

Type	Assignment	Field/Index Assignment	Function Argument Passing	`@lc.trace` Return	`@lc.func` Return
Python Object	Reference	Reference	Reference	Reference	Reference
Scalar (e.g. lc.int)	Value	N/A	Value	Value	Value
Compound Type (e.g. lc.float3, lc.float4x4)	Reference	Copy	Reference	Reference	Value

Let's take a look at an example:

@lc.kernel
def assignment_example():
    s = MyStruct(10, lc.float3(1.0, 2.0, 3.0))
    v = lc.float3(4.0, 5.0, 6.0)
    t = s # t is a reference to s as in Python
    t.b = v # v is copied into t.b.
    v.x += 1.0
    lc.print(t.b, s.b) # should print (4.0, 5.0, 6.0) twice
    lc.print(v) # should print (5.0, 5.0, 6.0)
    t2 = lc.copy(s) # t2 is a copy of s, not a reference
    t2.a += 1
    lc.print(t2.a, s.a) # should print 11, 10

Transient vs Persistent Values

Values in LuisaCompute can be categorized into transient and persistent values. Transient values are similar to rvalues in C++, meaning that they are temporarily created and hasn't bind to any variable yet. Persistent values are similar to lvalues in C++, meaning that they are bound to a variable and can be used as assignment target.

Since phyiscal reference might be supported on GPU, it is not possible to dynamically create reference to persistent values, for example

@lc.kernel
def transient_vs_persistent():
    v1 = lc.float3(1.0, 2.0, 3.0)
    v2 = lc.float3(4.0, 5.0, 6.0)

    # the following code is allowed since both `v1 + 1.0` and `v2 + 1.0` are transient values.
    if dynamic_cond:
        dynamic = v1 + 1.0
    else:
        dynamic = v2 + 1.0
        
    # the following code is not allowed since such dynamically created reference to persitent values cannot be implemented on GPU:
    if dynamic_cond:
        dynamic = v1
    else:
        dynamic = v2

    # instead, you can either use lc.copy() to create a copy of the struct:
    if dynamic_cond:
        dynamic = lc.copy(v1)
    else:
        dynamic = lc.copy(v2)

    # or use a static condition:
    if lc.comptime(cond):
        dynamic = v1
    else:
        dynamic = v2

Functions and Methods

Functions and methods in LuisaCompute are defined using the @lc.func or @lc.trace decorators. Both decorator transforms the python function into a LuisaCompute function that can be executed on both host (native Python) and device (LuisaCompute backend). The difference is that @lc.trace inlines the function body into the caller each time it is called, while @lc.func creates a separate function on the device.

In terms of usage, @lc.trace has a minor restriction on dynamic control flow (only a single dynamic return statement is allowed) while @lc.func has no restrictions. However, @lc.trace allows you to return references to variables, while @lc.func does not (since it is not possible to return reference to a local variable on device).

Decorator	Inline Function Body	Dynamic Control Flow Restrictions	Multiple Return Statement	Return Refences	Example Usage
`@lc.func`	No	Any control flow is allowed	Multiple returns allowed	No	Use for larger functions or when dynamic control flow is needed
`@lc.trace`	Yes	Cannot have dynamic return statement within dynamic loops	Single dynamic return statement only. Multiple static return statements allowed (statements guarded under `if lc.comptime(...)`	Yes	Use for small functions or when performance is critical
`@lc.kernel`	No	No	Multiple returns allowed. Cannot return values	No	Entry point for compute kernels

# lc.func has no restrictions on dynamic control flow, so you can use it like this:
@lc.func
def my_abs(x: lc.float) -> lc.float:
    if x < 0.0:
        return -x
    else:
        return x

# However, lc.trace does not allow multiple returns, so the following code is not allowed:
@lc.trace
def my_abs_trace(x: lc.float) -> lc.float:
    if x < 0.0:
        return -x
    else:
        return x

# Instead, you can use a single return statement:
@lc.trace
def my_abs_trace(x: lc.float) -> lc.float:
    if x < 0.0:
        x = -x
    return x

Generic Functions

Technically, all functions in LuisaCompute are generic functions, meaning that they can accept arguments of any type. The compiler would instantiate the function upon the first call with the given types. The type hints of any LuisaCompute function are ignored by the compiler. However, you are encouraged to write type hints for better readability.

def generic_add(a, b):
    return a + b

add(lc.f32(1.0), lc.f32(2.0))  # This will instantiate the function with f32 type
add(lc.int32(1), lc.int32(2))  # This will instantiate the function with int32 type

Meta-Programming: Building Dynamic Computation Graph

LuisaCompute has an excellent support for meta-programming, allowing you dynamically build your GPU program and computation graph in a natural and Pythonic way.

Kernel Lifecycle

### Stage 1: Define the kernel
### At this strage, the kernel is just a normal Python function. Nothing has compiled yet. The `lc.kernel` decorator injects some code to aid compliation later.

type Op = Literal['+', '-', '*', '/']

@lc.kernel
def  vecop(a: lc.Buffer[lc.float3], b: lc.Buffer[lc.float3], c: lc.Buffer[lc.float3], op: Op):
    i = lc.dispatch_id().x
    va = a[i]
    vb = b[i]
    if lc.comptime(op == '+'):
        print('Adding vectors') # this line wil be executed during stage 2
        c[i] = va + vb
    elif lc.comptime(op == '-'):
        c[i] = va - vb
    elif lc.comptime(op == '*'):
        c[i] = va * vb
    elif lc.comptime(op == '/'):
        c[i] = va / vb
    else:
        raise ValueError(f"Unsupported operation: {op}")

### Stage 2: Kernel Instantiation and Symbolic Tracing
### When you call the kernel, the compiler first replace all DSL variables in the argument with symolic variables (in this case, `a`, `b`, `c` are replaced with symbolic buffers), while normal Python variables are passed as is (in this case, `op` is a Python variable). The compiler then traces the kernel body and generates a computation graph that is specific to the provided arguments. 

compiled_kernel = vecop(buf_a, buf_b, buf_c, op='+')
# prints: Adding vectors

### In this case, the generate kernel looks like this:
@lc.kernel
def vecop_add(a: lc.Buffer[lc.float3], b: lc.Buffer[lc.float3], c: lc.Buffer[lc.float3]):
    i = lc.dispatch_id().x
    va = a[i]
    vb = b[i]
    c[i] = va + vb
### Note the other operations are removed since they are not used in this instantiation.

### Stage 3: Kernel Compilation
### After the specialized kernel is generated, it is sent to the backend for code generation. The resulting artifact can be then dispatched to the device for execution.
stream.submit([
    compiled_kernel.dispatch(1024) 
]).synchronize()

Comptime Expressions

Compile time expression are directives to instruct the compiler to specialize the code based on the provided values duriung Stage 2 of the kernel lifecycle.

v: lc.bool = ...
if v:
    print("Since v is symbolic during Stage 2 compilation, this code will be included in the compiled kernel.")
else:
    print("This code will be included in the kernel as well since at this stage we are not sure if v is True or False.")

if lc.comptime(True):
    print("This code will be included in the compiled kernel.")
else:
    print("This code will be excluded from the compiled kernel.")

PyTrees

PyTrees are containers that have a tree-like structure, where each leaf node can either be DSL type or a Python object. PyTrees are used to pass complex data structure to functions and kernels. The compiler will inspect the contents of the PyTree and generate specialized code according to the provided tree structure.

Let's take a look at an example of using PyTrees to pass a tree-like structure to a function:

@lc.pytree
class MyTree:
    v: lc.float3
    arr: List[lc.int32]

# Mytree.arr is not a valid DSL type since normally you cannot use Python lists on GPU.
# However, you can still pass it to a function or kernel, as long as the length of the list remains constant inside the kernel.

@lc.func
def foo(tree: MyTree) -> lc.float3:
    s = lc.float3(0.0, 0.0, 0.0)
    # use lc.comptime to hint that len(tree.arr) is known at kernel compile time
    for i in lc.comptime(range(len(tree.arr))):
        s += tree.v * lc.float32(tree.arr[i])
    return s

tree1 = MyTree(lc.float3(1.0, 2.0, 3.0), [lc.int32(1), lc.int32(2), lc.int32(3)])
tree2 = MyTree(lc.float3(4.0, 5.0, 6.0), [lc.int32(4), lc.int32(5)])

# These two calls will generate two versions of the `foo` function internally, based on the length of the `arr` field in each tree.
foo(tree1)
foo(tree2)

Exporting Functions to IR or C++ (AOT Compilation)

@lc.func
def generic_add(a, b):
    return a + b

compiler = lc.Compiler(backend='cpp')
compiler.compile(generic_add, example_inputs=(lc.f32(1.0), lc.f32(2.0)))
compiler.output('add_f32.cpp')

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
luisa_lang		luisa_lang
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
lcpyc.py		lcpyc.py
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

luisa-python-lang (WIP)

Introduction

Basic Syntax

Types

Scalar types:

Vector types

Matrix types

User-defined structs

Value and Reference Semantics

Parameter Passing

Dynamic and Static Control Flow

Assignment Behavior

Transient vs Persistent Values

Functions and Methods

Generic Functions

Meta-Programming: Building Dynamic Computation Graph

Kernel Lifecycle

Comptime Expressions

PyTrees

Exporting Functions to IR or C++ (AOT Compilation)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

LuisaGroup/luisa-python-lang

Folders and files

Latest commit

History

Repository files navigation

luisa-python-lang (WIP)

Introduction

Basic Syntax

Types

Scalar types:

Vector types

Matrix types

User-defined structs

Value and Reference Semantics

Parameter Passing

Dynamic and Static Control Flow

Assignment Behavior

Transient vs Persistent Values

Functions and Methods

Generic Functions

Meta-Programming: Building Dynamic Computation Graph

Kernel Lifecycle

Comptime Expressions

PyTrees

Exporting Functions to IR or C++ (AOT Compilation)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages