-
Notifications
You must be signed in to change notification settings - Fork 76
[LoadStoreOpToLLVM] Reuse 2D block IO load lowering for both regular pointer and block pointer to clean up duplicate code. #5500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the 2D block I/O load lowering code to consolidate duplicate logic between regular pointer and block pointer handling. The refactoring creates shared utilities for unpacking block pointer structures and computing memory access parameters, enabling both pointer types to use the same code path.
Key Changes:
- Introduces helper functions (
unpackLLBlockPointer,getBases,getPitch,getBaseOffsets) to extract and process pointer metadata uniformly - Removes the specialized
rewriteTensorPointerLoadfunction (~800 lines) in favor of the unified lowering path - Extends support to more layout types and transposed matrix loads for block pointers
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int stride = getStride(ptr, memoryRowMajor ? 0 : 1); | ||
| baseHeightInt = (stride == 0 ? 1 : tileHeight); | ||
| baseHeight = b.i32_val(baseHeightInt); | ||
| baseWidth = b.i32_val(std::max<unsigned>(64u, vBlocks * tileWidth)); |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate assignment to baseWidth where the first assignment on line 1444 is immediately overwritten by line 1445-1446. The first assignment should be removed as it has no effect.
| baseWidth = b.i32_val(std::max<unsigned>(64u, vBlocks * tileWidth)); |
| /*x*/ b.udiv(offsetX, b.i32_val(numPackedVals)), // The offsetX is the | ||
| // number of original | ||
| // elements. The 2d | ||
| // block io requires | ||
| // the offsetX is the | ||
| // number of packed | ||
| // elements. |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The multi-line comment explaining the offsetX calculation should be placed above the argument rather than inline to improve code readability and follow common commenting practices.
| /*x*/ b.udiv(offsetX, b.i32_val(numPackedVals)), // The offsetX is the | |
| // number of original | |
| // elements. The 2d | |
| // block io requires | |
| // the offsetX is the | |
| // number of packed | |
| // elements. | |
| // The offsetX is the number of original elements. The 2d | |
| // block io requires the offsetX is the number of packed | |
| // elements. | |
| /*x*/ b.udiv(offsetX, b.i32_val(numPackedVals)), |
| newVal = targetInfo.shuffleIdx( | ||
| rewriter, loc, oldVal, | ||
| b.urem(threadId, b.i32_val(tileWidth))); |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Inconsistent formatting: this multi-line function call should align with the assignment pattern used elsewhere in the file (e.g., line 1605-1606) for consistency.
| newVal = targetInfo.shuffleIdx( | |
| rewriter, loc, oldVal, | |
| b.urem(threadId, b.i32_val(tileWidth))); | |
| Value shuffleIdx = b.urem(threadId, b.i32_val(tileWidth)); | |
| newVal = targetInfo.shuffleIdx( | |
| rewriter, loc, oldVal, shuffleIdx); |
…pointer and block pointer to clean up duplicate code. Signed-off-by: Lu,Chengjun <[email protected]>
Use the common code to lower the load to 2D block IO with linear layout utils.