Add `GroupMemoryBarrierWithGroupSync` tests #442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

inbelic wants to merge 14 commits into llvm:main from inbelic:inbelic/group-barrier

Contributor

inbelic commented Sep 17, 2025 •

edited

Loading

Adds some basic testing of blocking functionality with the GroupMemoryBarrierWithGroupSync intrinsic.

It should be noted that the test cases are sized such that they are required to be evaluated over multiple waves. In this regard, each scenario fails using WARP without the barriers. Observing a failure is dependent on the scheduling of waves.

Resolves: #142.

Finn Plummer added 2 commits

September 17, 2025 09:42


          Add GroupMemoryBarrierWithGroupSync tests

8eeab5f


          self-review: mark as unsupported for clang

19d4d42

spall approved these changes

View reviewed changes

Collaborator

spall left a comment

lgtm but you should probably get a review from someone who knows more about this function.

bogner reviewed

View reviewed changes

Collaborator

bogner left a comment

The test case looks reasonable, if a bit hard to follow. in terms of where the individual result values are coming from. A few comments on the pipeline definition.

test/WaveOps/GroupMemoryBarrierWithSync.test Outdated Show resolved Hide resolved

test/WaveOps/GroupMemoryBarrierWithSync.test Outdated

+                - Name: ExpectedOut
+                  Format: Int32
+                  Stride: 16
+                  Data: [ 9, 90, 900, 9000,  9, 90, 1800, 18000, 10, 20, 120, 1120, 1, 1000, 100, 10 ]

Collaborator

bogner Sep 18, 2025

I think it's clearer if you split the lines so that the components of the vector each get their own line

Suggested change

      
                Data: [ 9, 90, 900, 9000,  9, 90, 1800, 18000, 10, 20, 120, 1120, 1, 1000, 100, 10 ]
          
                Data: [
          
                  9, 90, 900, 9000,
          
                  9, 90, 1800, 18000,
          
                  10, 20, 120, 1120,
          
                  1, 1000, 100, 10
          
                ]

test/WaveOps/GroupMemoryBarrierWithSync.test Outdated

+                  DispatchSize: [1, 1, 1]
+              Buffers:
+                - Name: In
+                  Format: Int32

Collaborator

bogner Sep 18, 2025

The HLSL uses uint for each of the buffers, but here we say Int32 - we should either make the HLSL used signed ints or use UInt32 here.

test/WaveOps/GroupMemoryBarrierWithSync.test Outdated

+              Buffers:
+                - Name: In
+                  Format: Int32
+                  Stride: 4

Collaborator

bogner Sep 18, 2025

"Stride" is in bytes, and since the elements here are uint4 vectors that means that this should be 16. In any case, Stride is really for structs, and you can just use "Channels: 4" here to say we have 4-element vectors.

test/WaveOps/GroupMemoryBarrierWithSync.test Outdated

+                  Data: [ 1, 10, 100, 1000]
+                - Name: Out
+                  Format: Int32
+                  Stride: 16

Collaborator

bogner Sep 18, 2025

Same here (though this stride matches the data type), better to just say "Channels: 4" instead of specifying stride.


          review: correct format/channels and readability

d84771e

damyanp requested changes

View reviewed changes

Collaborator

damyanp left a comment

We've been discussing offline, but I figured I'd get something into the record to see if @bogner or anyone else has opinions.

It looks like this test passes if you remove all of the barriers from the shader. This isn't surprising, because with only 4 threads per-thread group there's not really anything for the barrier to synchronize.

We should come up with something where it'll only pass if the barriers are actually in place.

Finn Plummer and others added 2 commits

September 18, 2025 14:40


          review: update examples for larger group size

99d6735

it was noted that if the group size is not larger than the wave size,
the barrier option is redundant

HLSL waves size can be at most 128. Hence, we initialize a group of 512
threads so that it is forced to evaluate over multiple waves.

We also remove the divergent test case as this is not applicable due to
it being undefined behaviour


          review: update to xfail format

f8e9588

Co-authored-by: Justin Bogner <[email protected]>

Contributor Author

inbelic commented Sep 18, 2025

I have updated the test-cases such that if you remove the barriers it fails locally for me using WARP.

I had to increase the group size such that is larger than a wave size, and then it will depend on how the waves are scheduled to get the datarace. But it at least consistently fails for me.

Finn Plummer added 2 commits

September 18, 2025 15:31


          fix name

1ac3cf3


          remove xfail: had the incorrect barrier intrinsic

9d04292

damyanp approved these changes

View reviewed changes

Collaborator

damyanp left a comment

I think this is probably doing more than we need to do to convince ourselves that the barrier works, but other than that lgtm.

test/WaveOps/GroupMemoryBarrierWithGroupSync.test Outdated

Comment on lines 43 to 44

    
                // Strided Read/Write:

                Indices[ThreadID.x][ThreadID.y] = ThreadID.x + ThreadID.y * 128;

Collaborator

damyanp Sep 19, 2025

Do we really need any more tests than we have after Out[1]? Arguably Out[0] is enough too, but it does seem that the other one is a bit more interesting.

Finn Plummer added 3 commits

September 19, 2025 09:34


          review: simplify test to remove confusing example

7e22586


          add metal failure

8820f92


          add intel failures

f4bef33

inbelic added the test-all label

inbelic and others added 3 commits

September 19, 2025 14:49


          Merge branch 'main' into inbelic/group-barrier

e0d2c66


          bump run onto clang fix

72f6663


          remove XFAIL for vulkan. is specific to d3d12

05d9c7b

inbelic removed the test-all label


          add clang xfail

efab0f4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet