[AMDGPU] Document "relaxed buffer OOB mode", update HSA default #134734

krzysz00 · 2025-04-07T21:18:41Z

This commit adds documentation for the relaxed-buffer-oob-mode subtarget feature so that users are aware of the performance implications of the change.

It also enables relaxed buffer OOB mode for HSA programs, which don't have this correctness requirement.

This commit adds documentation for the relaxed-buffer-oob-mode subtarget feature so that users are aware of the performance implications of the change. It also enables relaxed buffer OOB mode for HSA programs, which don't have this correctness requirement.

llvmbot · 2025-04-07T21:19:14Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Krzysztof Drewniak (krzysz00)

Changes

This commit adds documentation for the relaxed-buffer-oob-mode subtarget feature so that users are aware of the performance implications of the change.

It also enables relaxed buffer OOB mode for HSA programs, which don't have this correctness requirement.

Full diff: https://github.com/llvm/llvm-project/pull/134734.diff

4 Files Affected:

(modified) llvm/docs/AMDGPUUsage.rst (+35)
(modified) llvm/docs/ReleaseNotes.md (+5)
(modified) llvm/lib/Target/AMDGPU/GCNSubtarget.cpp (+2-1)
(modified) llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll (+2-2)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index d1535960a0257..9ca86aaa95b44 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1136,6 +1136,41 @@ is conservatively correct for OpenCL.
                              other operations within the same address space.
      ======================= ===================================================
 
+Relaxed Buffer OOB (Out Of Bounds) Mode
+---------------------------------------
+
+Instructions that load from or store to buffer resources (and thus, by extension
+buffer fat pointers and buffer strided pointers) generally implement handling for
+out of bounds (OOB) memory accesses, including those that are partially OOB,
+if the buffer resource resource has the required flags set.
+
+When operating on more than 32 bits of data, the `voffset` used for the access
+will be range-checked for each 32-bit word independently. This check uses saturating
+arithmetic and interprets the offset as an unsigned value.
+
+The behavior described above conflicts with the ABI requirements of certain graphics
+APIs that require out of bounds accesses to be handled strictly so that accessed
+that begin out of bounds but then access in-bounds elements (such as loading A
+``<4 x i32>`` beginning at offset ``-4``) still load the three in-bounds integers.
+
+Similarly, buffer fat pointers permit operating types such as `<8 x i8>` which
+must be accessed (and bounds-checked) 4 bytes at a time. Non-word-aligned
+accesses to such types from near the end of a buffer resource (such as starting
+a load of an ``<8xi8>`` from an offset of ``6`` on an 8-byte buffer) will treat
+the initial two bytes to be loaded/stored as out of bounds, even though, under
+a strict interpretation of the bounds-checking semantics, they would be out of bounds.
+
+These violations of strict bounds-checking semantics for buffer resources require
+usage of less-vectorized code to ensure correctness. Ifthis strict conformance
+is not required, the target feature ``relaxed-oob-buffer-mode`` should be enabled
+(using ``-mcpu``, ``-offload-arch`` or ``-mattr``).
+
+``relaxed-buffer-oob-mode`` permits unaligned memory acceses through a buffer resource
+to propagate to nearby elemennts, causing them to become out of bounds as well.
+
+``relaxed-buffer-oob-mode`` is **enabled** on HSA targets by default to preserve
+compute performance and existing ABI expectations.
+
 LLVM IR Intrinsics
 ------------------
 
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 58cf71b947083..411c469d32b09 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -92,6 +92,11 @@ Changes to the AMDGPU Backend
 
 * Bump the default `.amdhsa_code_object_version` to 6. ROCm 6.3 is required to run any program compiled with COV6.
 
+* Turn on strict buffer OOB checking on non-AMDHSA OSs. This improves the correctness
+  of buffer accesses in some cases at the cost of performance for programs that do not
+  contain unaligned out-of-bounds accesses. The old behavior may be restored with the
+  `relaxed-buffer-oob-mode` feature.
+
 Changes to the ARM Backend
 --------------------------
 
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
index 53f5c1efd14eb..1bd2230b626ee 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
@@ -71,7 +71,8 @@ GCNSubtarget &GCNSubtarget::initializeSubtargetDependencies(const Triple &TT,
   // Turn on features that HSA ABI requires. Also turn on FlatForGlobal by
   // default
   if (isAmdHsaOS())
-    FullFS += "+flat-for-global,+unaligned-access-mode,+trap-handler,";
+    FullFS += "+flat-for-global,+unaligned-access-mode,+trap-handler,"
+              "+relaxed-buffer-oob-mode,";
 
   FullFS += "+enable-prt-strict-null,"; // This is overridden by a disable in FS
 
diff --git a/llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll b/llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll
index ede2e4066c263..01239b9946e64 100644
--- a/llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll
+++ b/llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll
@@ -1,5 +1,5 @@
-; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=load-store-vectorizer -mattr=+relaxed-buffer-oob-mode -S -o - %s | FileCheck --check-prefixes=CHECK,CHECK-OOB-RELAXED %s
-; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=load-store-vectorizer -S -o - %s | FileCheck --check-prefixes=CHECK,CHECK-OOB-STRICT %s
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=load-store-vectorizer -S -o - %s | FileCheck --check-prefixes=CHECK,CHECK-OOB-RELAXED %s
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=load-store-vectorizer -mattr=-relaxed-buffer-oob-mode -S -o - %s | FileCheck --check-prefixes=CHECK,CHECK-OOB-STRICT %s
 
 target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-ni:7"

arsenm · 2025-04-08T01:39:53Z

My understanding that this was to model different behavior of different sub targets. If this is a software defined usage mode it should not be a subtarget feature

krzysz00 · 2025-04-08T04:05:17Z

I'll let @piotrAMD give the proper context here but, last I checked, this was introduced to provide different behavior for some Vulkan cases that needed to be stricter about vectorization / OOB - which means we don't need it on for HSA ... though it's something it's reasonable for people to opt in to.

Is this a language issue, where we don't want to be telling people they can toggle this (in which case, there's comments in AMDGPU.td that should be reexamined) or some more fundamental complaint?

piotrAMD · 2025-04-08T08:32:20Z

My understanding that this was to model different behavior of different sub targets. If this is a software defined usage mode it should not be a subtarget feature

Is the suggestion to use a regular function attribute rather than subtarget feature? I settled on subtarget feature mainly because 'allowsMisalignedMemoryAccessesImpl' operates in the context of subtarget rather than function.

shiltian · 2025-04-16T01:44:25Z

llvm/docs/AMDGPUUsage.rst

+
+The behavior described above conflicts with the ABI requirements of certain graphics
+APIs that require out of bounds accesses to be handled strictly so that accessed
+that begin out of bounds but then access in-bounds elements (such as loading A


Suggested change

that begin out of bounds but then access in-bounds elements (such as loading A

that begin out of bounds but then access in-bounds elements (such as loading a

shiltian · 2025-04-16T01:45:18Z

llvm/docs/AMDGPUUsage.rst

+The behavior described above conflicts with the ABI requirements of certain graphics
+APIs that require out of bounds accesses to be handled strictly so that accessed
+that begin out of bounds but then access in-bounds elements (such as loading A
+``<4 x i32>`` beginning at offset ``-4``) still load the three in-bounds integers.


Just out of curiosity, will it be <undef, i32, i32, 32> or <i32, i32, i32, undef>?

<0, i32, i32, i32> if you're using whatever the strict Vulkan API is

Ordinary, the left end gives you <0, 0, 0, 0,>

shiltian · 2025-04-16T01:45:48Z

llvm/docs/AMDGPUUsage.rst

+Similarly, buffer fat pointers permit operating types such as `<8 x i8>` which
+must be accessed (and bounds-checked) 4 bytes at a time. Non-word-aligned
+accesses to such types from near the end of a buffer resource (such as starting
+a load of an ``<8xi8>`` from an offset of ``6`` on an 8-byte buffer) will treat


Suggested change

a load of an ``<8xi8>`` from an offset of ``6`` on an 8-byte buffer) will treat

a load of an ``<8 x i8>`` from an offset of ``6`` on an 8-byte buffer) will treat

shiltian · 2025-04-16T01:46:17Z

llvm/docs/AMDGPUUsage.rst

+a strict interpretation of the bounds-checking semantics, they would be out of bounds.
+
+These violations of strict bounds-checking semantics for buffer resources require
+usage of less-vectorized code to ensure correctness. Ifthis strict conformance


Suggested change

usage of less-vectorized code to ensure correctness. Ifthis strict conformance

usage of less-vectorized code to ensure correctness. If this strict conformance

shiltian · 2025-04-16T01:48:23Z

llvm/docs/AMDGPUUsage.rst

+These violations of strict bounds-checking semantics for buffer resources require
+usage of less-vectorized code to ensure correctness. Ifthis strict conformance
+is not required, the target feature ``relaxed-oob-buffer-mode`` should be enabled
+(using ``-mcpu``, ``-offload-arch`` or ``-mattr``).


How does the --offload-arch work? Like --offload-arch=gfx1200:relaxed-oob-buffer-mode+? I think we only allow specific sub target features to be appended to target id.

krzysz00 · 2025-06-09T19:25:13Z

Wanted to ping here - there's been wider discussion about how relaxed buffer OOB mode isn't really a subtarget feature

I think it could be made a function attribute - and, more relevantly, should be flipped. The relaxed mode is the expected/default behavior, and the strict mode is only needed to block certain optimizations in some Vulkan contexts - as far as I understand things

krzysz00 requested review from jayfoad, arsenm, shiltian and piotrAMD April 7, 2025 21:18

llvmbot added backend:AMDGPU llvm:transforms labels Apr 7, 2025

shiltian reviewed Apr 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Document "relaxed buffer OOB mode", update HSA default #134734

[AMDGPU] Document "relaxed buffer OOB mode", update HSA default #134734

Uh oh!

krzysz00 commented Apr 7, 2025

Uh oh!

llvmbot commented Apr 7, 2025 •

edited

Loading

Uh oh!

arsenm commented Apr 8, 2025

Uh oh!

krzysz00 commented Apr 8, 2025

Uh oh!

piotrAMD commented Apr 8, 2025

Uh oh!

shiltian Apr 16, 2025

Uh oh!

shiltian Apr 16, 2025

Uh oh!

krzysz00 Apr 16, 2025

Uh oh!

shiltian Apr 16, 2025

Uh oh!

shiltian Apr 16, 2025

Uh oh!

shiltian Apr 16, 2025

Uh oh!

krzysz00 commented Jun 9, 2025

Uh oh!

Uh oh!

	that begin out of bounds but then access in-bounds elements (such as loading A
	that begin out of bounds but then access in-bounds elements (such as loading a

	a load of an ``<8xi8>`` from an offset of ``6`` on an 8-byte buffer) will treat
	a load of an ``<8 x i8>`` from an offset of ``6`` on an 8-byte buffer) will treat

	usage of less-vectorized code to ensure correctness. Ifthis strict conformance
	usage of less-vectorized code to ensure correctness. If this strict conformance

[AMDGPU] Document "relaxed buffer OOB mode", update HSA default #134734

Are you sure you want to change the base?

[AMDGPU] Document "relaxed buffer OOB mode", update HSA default #134734

Uh oh!

Conversation

krzysz00 commented Apr 7, 2025

Uh oh!

llvmbot commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Apr 8, 2025

Uh oh!

krzysz00 commented Apr 8, 2025

Uh oh!

piotrAMD commented Apr 8, 2025

Uh oh!

shiltian Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

krzysz00 Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Jun 9, 2025

Uh oh!

Uh oh!

llvmbot commented Apr 7, 2025 •

edited

Loading