Skip to content

Conversation

@chengjunlu
Copy link
Contributor

Use the common code to lower the load to 2D block IO with linear layout utils.

  1. It supports more layouts other than DPAS for block pointer.
  2. It supports load transposed matrix for block pointer for general case.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the 2D block I/O load lowering code to consolidate duplicate logic between regular pointer and block pointer handling. The refactoring creates shared utilities for unpacking block pointer structures and computing memory access parameters, enabling both pointer types to use the same code path.

Key Changes:

  • Introduces helper functions (unpackLLBlockPointer, getBases, getPitch, getBaseOffsets) to extract and process pointer metadata uniformly
  • Removes the specialized rewriteTensorPointerLoad function (~800 lines) in favor of the unified lowering path
  • Extends support to more layout types and transposed matrix loads for block pointers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

int stride = getStride(ptr, memoryRowMajor ? 0 : 1);
baseHeightInt = (stride == 0 ? 1 : tileHeight);
baseHeight = b.i32_val(baseHeightInt);
baseWidth = b.i32_val(std::max<unsigned>(64u, vBlocks * tileWidth));
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate assignment to baseWidth where the first assignment on line 1444 is immediately overwritten by line 1445-1446. The first assignment should be removed as it has no effect.

Suggested change
baseWidth = b.i32_val(std::max<unsigned>(64u, vBlocks * tileWidth));

Copilot uses AI. Check for mistakes.
Comment on lines +1629 to +1635
/*x*/ b.udiv(offsetX, b.i32_val(numPackedVals)), // The offsetX is the
// number of original
// elements. The 2d
// block io requires
// the offsetX is the
// number of packed
// elements.
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The multi-line comment explaining the offsetX calculation should be placed above the argument rather than inline to improve code readability and follow common commenting practices.

Suggested change
/*x*/ b.udiv(offsetX, b.i32_val(numPackedVals)), // The offsetX is the
// number of original
// elements. The 2d
// block io requires
// the offsetX is the
// number of packed
// elements.
// The offsetX is the number of original elements. The 2d
// block io requires the offsetX is the number of packed
// elements.
/*x*/ b.udiv(offsetX, b.i32_val(numPackedVals)),

Copilot uses AI. Check for mistakes.
Comment on lines +1670 to +1672
newVal = targetInfo.shuffleIdx(
rewriter, loc, oldVal,
b.urem(threadId, b.i32_val(tileWidth)));
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Inconsistent formatting: this multi-line function call should align with the assignment pattern used elsewhere in the file (e.g., line 1605-1606) for consistency.

Suggested change
newVal = targetInfo.shuffleIdx(
rewriter, loc, oldVal,
b.urem(threadId, b.i32_val(tileWidth)));
Value shuffleIdx = b.urem(threadId, b.i32_val(tileWidth));
newVal = targetInfo.shuffleIdx(
rewriter, loc, oldVal, shuffleIdx);

Copilot uses AI. Check for mistakes.
@chengjunlu chengjunlu marked this pull request as draft November 18, 2025 08:12
…pointer and block pointer to clean up duplicate code.

Signed-off-by: Lu,Chengjun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BLOCK IO] Clean up the code to unify the block IO lowering for both both tensor descritpor (block ptr) and teonsr of pointers.

2 participants