[CIR][CIRGen][Builtin][X86] Lower AVX masked load intrinsics #1763

RiverDave · 2025-07-27T06:53:37Z

For these intrinsics there only seems to be one function where the IR emmited seems to diverge:

for _mm_load_sbh loads a single 16-bit bfloat (__bf16) value from memory into the lowest element of a 128-bit bfloat vector (__m128bh), leaving the remaining lanes unchanged or filled with a passthrough value. It is implemented using a masked load with only the first lane enabled.

source for intrinsics with similar behaviour

In the CIR lowering of _mm_load_sbh, we are currently emitting the mask of intrinsic (llvm.masked.load) operand as an explicit constant vector:

<8 x i1> <true, false, false, false, false, false, false, false>

whereas OG lowers:

<8 x i1> bitcast (<1 x i8> splat (i8 1) to <8 x i1>)

I believe both things are semantically equal so:

Is it acceptable for CIR and OG to diverge in this way for masked loads, or should we aim for parity in how the mask is represented, even if that reduces readability in CIR?

bcardosolopes · 2025-07-28T20:12:31Z

Is it acceptable for CIR and OG to diverge in this way for masked loads, or should we aim for parity in how the mask is represented, even if that reduces readability in CIR?

Sine they are semantically equivalent it's ok if we diverge in cases like this (the change needed is orthogonal to this PR). However, we'd like to converge eventually, in the meantime we should track it in an issue (so someone can do follow up work that touches parts not related with this PR) and also writing a comment in the tests on how this looks like in OG. (Btw, we have a issue label for IR-differences).

bcardosolopes · 2025-07-28T20:13:21Z

Needs fixing conflict resolution

RiverDave · 2025-07-29T01:29:54Z

Is it acceptable for CIR and OG to diverge in this way for masked loads, or should we aim for parity in how the mask is represented, even if that reduces readability in CIR?

Sine they are semantically equivalent it's ok if we diverge in cases like this (the change needed is orthogonal to this PR). However, we'd like to converge eventually, in the meantime we should track it in an issue (so someone can do follow up work that touches parts not related with this PR) and also writing a comment in the tests on how this looks like in OG. (Btw, we have a issue label for IR-differences).

Sounds amazing! Looking forward on working on those. Related to this I've opened #1767 (I don't have permissions to add the label yet). I've also added a brief note on the test referencing the issue as suggested.

For these intrinsics there only seems to be one function where the IR emmited seems to diverge: for `_mm_load_sbh` loads a single 16-bit bfloat (__bf16) value from memory into the lowest element of a 128-bit bfloat vector (__m128bh), leaving the remaining lanes unchanged or filled with a passthrough value. It is implemented using a masked load with only the first lane enabled. [source for intrinsics with similar behaviour](https://gist.github.com/leopck/86799fee6ceb9649d0ebe32c1c6e5b85) In the CIR lowering of `_mm_load_sbh`, we are currently emitting the mask of intrinsic (`llvm.masked.load`) operand as an explicit constant vector: ``` llvm <8 x i1> <true, false, false, false, false, false, false, false> ``` whereas OG lowers: ```llvm <8 x i1> bitcast (<1 x i8> splat (i8 1) to <8 x i1>) ``` I believe both things are semantically equal so: Is it acceptable for CIR and OG to diverge in this way for masked loads, or should we aim for parity in how the mask is represented, even if that reduces readability in CIR?

RiverDave requested review from lanza and bcardosolopes as code owners July 27, 2025 06:53

bcardosolopes approved these changes Jul 28, 2025

View reviewed changes

RiverDave force-pushed the avx-masked-loads branch from 98e2765 to a97ee5e Compare July 29, 2025 00:50

RiverDave mentioned this pull request Jul 29, 2025

Explicit mask vs splat bitcast in masked load for _mm_load_sbh #1767

Open

[CIR][CIRGen][Builtin][X86] Lower remaining AVX masked load intrinsics

9ad8f03

RiverDave force-pushed the avx-masked-loads branch from a97ee5e to 9ad8f03 Compare July 29, 2025 01:28

bcardosolopes merged commit 46686b7 into llvm:main Jul 29, 2025
6 of 9 checks passed

RiverDave deleted the avx-masked-loads branch September 13, 2025 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CIR][CIRGen][Builtin][X86] Lower AVX masked load intrinsics #1763

[CIR][CIRGen][Builtin][X86] Lower AVX masked load intrinsics #1763

Uh oh!

RiverDave commented Jul 27, 2025 •

edited

Loading

Uh oh!

bcardosolopes commented Jul 28, 2025 •

edited

Loading

Uh oh!

bcardosolopes commented Jul 28, 2025

Uh oh!

RiverDave commented Jul 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[CIR][CIRGen][Builtin][X86] Lower AVX masked load intrinsics #1763

[CIR][CIRGen][Builtin][X86] Lower AVX masked load intrinsics #1763

Uh oh!

Conversation

RiverDave commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bcardosolopes commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bcardosolopes commented Jul 28, 2025

Uh oh!

RiverDave commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RiverDave commented Jul 27, 2025 •

edited

Loading

bcardosolopes commented Jul 28, 2025 •

edited

Loading

RiverDave commented Jul 29, 2025 •

edited

Loading