-
Notifications
You must be signed in to change notification settings - Fork 76
Description
The code for lowering the tensor pointer load to TritonGen 2D Block Loads takes a parameterization of the 2D block tile size and permutes it based on properties of the DPAS layout and inputs to generate a final set of 2D block loads, shuffle the outputs, and pack/unpack the LLVM registers appropriately for the subsequent users. #3000 introduces some code duplication as the 2D block load for the DPAS layout is similar in many ways, but also quite different. To resolve this duplication and make the code easier to read, I am introducing a struct to keep track of the 2D block load parameters. The existing code in LoadStoreOpToLLVM.cpp will permute the struct instead of directly modifying the MLIR values. This will make the code easier to read, reduce duplication further between the DPAS layout and DotDpas layout, and allow us to easily dump debug information about the loads being generated, as the TritonGen loads are immediately lowered to SPIRV function calls and not written in any intermediate IR.