[LoadStoreOpToLLVM] Reuse 2D block IO load lowering for both regular pointer and block pointer to clean up duplicate code. #5500

chengjunlu · 2025-11-18T08:07:54Z

Use the common code to lower the load to 2D block IO with linear layout utils.

It supports more layouts other than DPAS for block pointer.
It supports load transposed matrix for block pointer for general case.

Copilot

Pull Request Overview

This PR refactors the 2D block I/O load lowering code to consolidate duplicate logic between regular pointer and block pointer handling. The refactoring creates shared utilities for unpacking block pointer structures and computing memory access parameters, enabling both pointer types to use the same code path.

Key Changes:

Introduces helper functions (unpackLLBlockPointer, getBases, getPitch, getBaseOffsets) to extract and process pointer metadata uniformly
Removes the specialized rewriteTensorPointerLoad function (~800 lines) in favor of the unified lowering path
Extends support to more layout types and transposed matrix loads for block pointers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-18T08:08:51Z

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

+      int stride = getStride(ptr, memoryRowMajor ? 0 : 1);
+      baseHeightInt = (stride == 0 ? 1 : tileHeight);
+      baseHeight = b.i32_val(baseHeightInt);
+      baseWidth = b.i32_val(std::max<unsigned>(64u, vBlocks * tileWidth));


Duplicate assignment to baseWidth where the first assignment on line 1444 is immediately overwritten by line 1445-1446. The first assignment should be removed as it has no effect.

Suggested change

baseWidth = b.i32_val(std::max<unsigned>(64u, vBlocks * tileWidth));

Copilot · 2025-11-18T08:08:52Z

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

+          /*x*/ b.udiv(offsetX, b.i32_val(numPackedVals)), // The offsetX is the
+                                                           // number of original
+                                                           // elements. The 2d
+                                                           // block io requires
+                                                           // the offsetX is the
+                                                           // number of packed
+                                                           // elements.


[nitpick] The multi-line comment explaining the offsetX calculation should be placed above the argument rather than inline to improve code readability and follow common commenting practices.

Suggested change

/*x*/ b.udiv(offsetX, b.i32_val(numPackedVals)), // The offsetX is the

// number of original

// elements. The 2d

// block io requires

// the offsetX is the

// number of packed

// elements.

// The offsetX is the number of original elements. The 2d

// block io requires the offsetX is the number of packed

// elements.

/*x*/ b.udiv(offsetX, b.i32_val(numPackedVals)),

Copilot · 2025-11-18T08:08:52Z

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

+                newVal = targetInfo.shuffleIdx(
+                    rewriter, loc, oldVal,
+                    b.urem(threadId, b.i32_val(tileWidth)));


[nitpick] Inconsistent formatting: this multi-line function call should align with the assignment pattern used elsewhere in the file (e.g., line 1605-1606) for consistency.

Suggested change

newVal = targetInfo.shuffleIdx(

rewriter, loc, oldVal,

b.urem(threadId, b.i32_val(tileWidth)));

Value shuffleIdx = b.urem(threadId, b.i32_val(tileWidth));

newVal = targetInfo.shuffleIdx(

rewriter, loc, oldVal, shuffleIdx);

…pointer and block pointer to clean up duplicate code. Signed-off-by: Lu,Chengjun <[email protected]>

chengjunlu requested a review from Copilot November 18, 2025 08:07

chengjunlu linked an issue Nov 18, 2025 that may be closed by this pull request

[BLOCK IO] Clean up the code to unify the block IO lowering for both both tensor descritpor (block ptr) and teonsr of pointers. #5442

Open

Copilot AI reviewed Nov 18, 2025

View reviewed changes

chengjunlu marked this pull request as draft November 18, 2025 08:12

[LoadStoreOpToLLVM] Reuse 2D block IO load lowering for both regular …

f0d6179

…pointer and block pointer to clean up duplicate code. Signed-off-by: Lu,Chengjun <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoadStoreOpToLLVM] Reuse 2D block IO load lowering for both regular pointer and block pointer to clean up duplicate code. #5500

[LoadStoreOpToLLVM] Reuse 2D block IO load lowering for both regular pointer and block pointer to clean up duplicate code. #5500

chengjunlu commented Nov 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 18, 2025

Uh oh!

Copilot AI Nov 18, 2025

Uh oh!

Copilot AI Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[LoadStoreOpToLLVM] Reuse 2D block IO load lowering for both regular pointer and block pointer to clean up duplicate code. #5500

Are you sure you want to change the base?

[LoadStoreOpToLLVM] Reuse 2D block IO load lowering for both regular pointer and block pointer to clean up duplicate code. #5500

Conversation

chengjunlu commented Nov 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants