Initial commit for basic SER scene with toggle #917

chauvuha · 2025-06-27T21:42:04Z

Initial commit for basic SER scene with toggle

Implemented a scene that demonstrates the integration of Shader Execution Reordering (SER) to showcase performance gains in multi-material scenarios:

Scene contains two geometry instances, each assigned a unique materialID (0 and 1) via the shader table.
SER is applied based on the HitObject and sortKey being materialID.
This scene shows a 30 - 50% improvement in FPS, depending on the number of objects and spacings between them.

...es/Desktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/Raytracing.hlsl

...sktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/Screenshot_small.png

...12RaytracingBasicShaderExecutionReordering/D3D12RaytracingBasicShaderExecutionReordering.cpp

Samples/Desktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/dxc.exe

amarpMSFT · 2025-06-28T00:22:51Z

...es/Desktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/Raytracing.hlsl

+        uint materialID = hit.LoadLocalRootTableConstant(16);
+        uint hintBits = 1;
+
+        // Reorder threads based on the hit object and material ID (0 - cube, 1 - complex).


When I run the sample, it looks like if I just call MaybeReorderThread(hit) the perf win is ~the same is MaybeReorderThread(hit, materialID, hintBits).

And doing MaybeReorderThread(materialID, hintBits) provides no perf win (slight perf loss).

So the sort hint isn't actually helpful based on the content here - worth looking into why.

The original texture load wasn’t heavy enough for the benefits of reordering to stand out. I’ve replaced it with a heavier workload to better demonstrate the divergence and performance impact for now.

With the latest version I still see the same, perf win is only coming from sorting based on the hit. e.g. MaybeReorderThread(hit) provides a win, but materialID makes no difference (whether or not combined with hit).

I'm running on a 4090. It's also possible it behaves differently on 5xxx?

Ah I see. Using my 5xxx, I did see a difference in perf. The more I loop the texture sampling, the more of a perf divergence it will be

Interesting, so MaybeReorderThread(materialID) gives a perf win? Does hit alone also give a perf win? Do both together win more?

Yes, I saw MaybeReorderThread(materialID) giving a perf win. I would say hit >= (hit, materialID) > materialID. I also just tried forcing a heavier workload with math.

I'd stick with maybe 10 texture lookups or so if that still shows wins. 1000 is pretty unrealistic.

amarpMSFT · 2025-06-30T22:27:22Z

...es/Desktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/Raytracing.hlsl

-            float2 offset = float2(i * 0.01, i * 0.01);
-            colorSum += MaterialTexture.SampleLevel(TextureSampler, uv + offset, 0).rgb;
+            colorSum += MaterialTexture.SampleLevel(TextureSampler, uv, 0).rgb;
+            colorSum = sin(colorSum) + cos(colorSum);


On my 4090, this colorSum = sin(colorSum) + cos(colorSum); seems to hit an nvidia driver bug, crashing in driver shader compiler.

Splitting to per component solved it:

colorSum.r = sin(colorSum.r) + cos(colorSum.r); colorSum.g = sin(colorSum.g) + cos(colorSum.g); colorSum.b = sin(colorSum.b) + cos(colorSum.b);

Though that is a bit ugly. Could try to get NVIDIA to fix the bug, but it's probably not worth the trouble.

Also, it's possible the compiler/driver will be smart and notice it could just sample the texture once, reusing the result.
If the intent is to sample the texture for each loop, could keep the uv offset you had before?

At least on 4090, 10 iterations with offsetted uvs gives a win as long as the hit is part of the sort (materialID makes no difference).

On the 4090, do your 10 iterations include both the offset UVs and the cosine/sine calculations, or are they just for the offset UVs? (Mine still needs to include the cosine/sine iterations.)

amarpMSFT · 2025-07-01T17:53:35Z

...es/Desktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/Raytracing.hlsl

-void MyClosestHitShader(inout RayPayload payload, in MyAttributes attr)
-{
+void ClosestHit_Complex(inout RayPayload payload, in MyAttributes attr)
+    {


Are the complex and simple hit shaders different in any way? I wouldn't have expected each of them to have complex vs simple branches in them. And would have expected at least some more complexity in the one labelled as "complex".

nadaOuf · 2025-07-01T23:30:18Z

...es/Desktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/Raytracing.hlsl

+        // Reorder threads based on hitobject
+        dx::MaybeReorderThread(hit);
+
+        // Reorder threads based on material ID (0 - cube, 1 - complex).


I would say either remove the commented shader code or if it is a different option then there should be a way to switch to it using user input.

Initial commit for basic SER scene with toggle

8cdb161

chauvuha assigned nadaOuf Jun 27, 2025

amarpMSFT reviewed Jun 28, 2025

View reviewed changes

chauvuha added 2 commits June 30, 2025 09:14

Fixed: numHintBits var name, removed hlsl.h reference and binaries

d982d64

Remove DirectX from tracking and update .gitignore

a5622da

chauvuha requested review from amarpMSFT and removed request for amarpMSFT June 30, 2025 16:28

chauvuha added 2 commits June 30, 2025 09:40

Replaced workload to demonstrate more divergence

d8210de

Added comments for reordering options

34670da

chauvuha requested a review from amarpMSFT June 30, 2025 16:43

chauvuha added 2 commits June 30, 2025 13:47

deleted stale screenshot

5314122

force heavier workload

2e86be4

amarpMSFT reviewed Jun 30, 2025

View reviewed changes

chauvuha added 3 commits June 30, 2025 15:31

Reduced number of loops

7a753a9

Include offsetted uvs in sampling texture

89ed840

Separated materialID == 0 to a different closesthit shader

a29d93d

amarpMSFT reviewed Jul 1, 2025

View reviewed changes

nadaOuf reviewed Jul 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial commit for basic SER scene with toggle #917

Initial commit for basic SER scene with toggle #917

Uh oh!

chauvuha commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amarpMSFT Jun 28, 2025

Uh oh!

chauvuha Jun 30, 2025

Uh oh!

amarpMSFT Jun 30, 2025 •

edited

Loading

Uh oh!

chauvuha Jun 30, 2025

Uh oh!

amarpMSFT Jun 30, 2025

Uh oh!

chauvuha Jun 30, 2025

Uh oh!

amarpMSFT Jun 30, 2025

Uh oh!

amarpMSFT Jun 30, 2025

Uh oh!

amarpMSFT Jun 30, 2025

Uh oh!

amarpMSFT Jun 30, 2025

Uh oh!

chauvuha Jun 30, 2025 •

edited

Loading

Uh oh!

amarpMSFT Jul 1, 2025

Uh oh!

nadaOuf Jul 1, 2025

Uh oh!

Uh oh!

Initial commit for basic SER scene with toggle #917

Are you sure you want to change the base?

Initial commit for basic SER scene with toggle #917

Uh oh!

Conversation

chauvuha commented Jun 27, 2025

Initial commit for basic SER scene with toggle

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amarpMSFT Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chauvuha Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amarpMSFT Jun 30, 2025 •

edited

Loading

chauvuha Jun 30, 2025 •

edited

Loading