Skip to content

Commit a312d98

Browse files
committed
Use block loads for post-dpas vector computation 4/4
1 parent 6d5a462 commit a312d98

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -558,6 +558,12 @@ struct LoadOpConversion
558558
delinearize(rewriter, loc, warpId, warpsPerCTA, dpasOrder);
559559

560560
if (hasDpasLayout) {
561+
// A block load with the DPAS layout but without the DotDpasLayout is
562+
// expected to follow the ordering of the DPAS output. For a 2D block
563+
// load, the rows are distributed across work items/SIMD lanes and the
564+
// column vectors are available for each work item to process. This layout
565+
// aligns to the DPAS layout as the DPAS operation output layout
566+
// distributes rows across work items.
561567
if (isTransposeRequired) {
562568
// TODO: this would likely require a shuffle to match the expected
563569
// ordering coming out of the DPAS layout and requires more

0 commit comments

Comments
 (0)