Skip to content

Conversation

@januszjah
Copy link
Contributor

@januszjah januszjah commented Nov 17, 2025

This PR improves constant expression folding for pitch and width parameters in Intel GPU block I/O operations. The changes introduce a more robust constant evaluation mechanism that handles multiple levels of type casts and operation folding, addressing issue #5338.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves constant expression folding for pitch and width parameters in Intel GPU block I/O operations. The changes introduce a more robust constant evaluation mechanism that handles multiple levels of type casts and operation folding, addressing issue #5338.

  • Adds a new tryConstEval function that recursively evaluates constants through casts and foldable operations
  • Replaces the previous skipTrunc-based pitch validation with the new evaluation approach
  • Adds width validation using the same constant evaluation mechanism
  • Updates test kernel to mark stride parameters as constexpr to enable better compile-time evaluation

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp Implements new constant evaluation helpers and applies them to pitch/width validation in block I/O conversion
python/test/unit/language/test_block_pointer.py Marks stride parameters as constexpr in test kernel to support improved constant folding

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +54 to +56
if (succeeded(def->fold(results)) && results.size() == 1) {
if (auto val = dyn_cast_or_null<Value>(results[0]))
return val;
}
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fold result could be an Attribute containing a constant value, not just a Value. If results[0] is an Attribute, it should be converted to a constant Value rather than being ignored. This would cause tryConstEval to miss foldable constant attributes.

Copilot uses AI. Check for mistakes.
@etiotto etiotto linked an issue Nov 17, 2025 that may be closed by this pull request
return v;
}

static std::optional<int64_t> tryConstEval(Value v, int depth = 16) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we port these changes to Utility.cpp and improve mlir::triton::intel::getFoldedConstantValue??

Copy link
Contributor Author

@januszjah januszjah Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will increase work done while compilation because getFoldedConstantValue is used in isConstant in many places and my version does multiple skips and folds instead of single fold, however IDK if it's a problem and how obsessive we are about cycles at this step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is depth 16 the minimum depth required for this case to work? I also noticed that getFoldedConstantValue do fold one more time when results is empty, is it beneficial to do something like that here?
Maybe we can improve getFoldedConstantValue with a smaller default depth? No matter what, IMO these new functions should be part of Utility.cpp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 is just protection here, it usually ends in 2-3 passes and it will quit early. Doing second try while folding may be beneficial, good catch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[2Dblock] fix minicore test's runtime checks

4 participants