-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Initial commit for basic SER scene with toggle #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
...es/Desktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/Raytracing.hlsl
Outdated
Show resolved
Hide resolved
...sktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/Screenshot_small.png
Outdated
Show resolved
Hide resolved
...12RaytracingBasicShaderExecutionReordering/D3D12RaytracingBasicShaderExecutionReordering.cpp
Outdated
Show resolved
Hide resolved
Samples/Desktop/D3D12Raytracing/src/D3D12RaytracingBasicShaderExecutionReordering/dxc.exe
Outdated
Show resolved
Hide resolved
uint materialID = hit.LoadLocalRootTableConstant(16); | ||
uint hintBits = 1; | ||
|
||
// Reorder threads based on the hit object and material ID (0 - cube, 1 - complex). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run the sample, it looks like if I just call MaybeReorderThread(hit) the perf win is ~the same is MaybeReorderThread(hit, materialID, hintBits).
And doing MaybeReorderThread(materialID, hintBits) provides no perf win (slight perf loss).
So the sort hint isn't actually helpful based on the content here - worth looking into why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original texture load wasn’t heavy enough for the benefits of reordering to stand out. I’ve replaced it with a heavier workload to better demonstrate the divergence and performance impact for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the latest version I still see the same, perf win is only coming from sorting based on the hit. e.g. MaybeReorderThread(hit) provides a win, but materialID makes no difference (whether or not combined with hit).
I'm running on a 4090. It's also possible it behaves differently on 5xxx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Using my 5xxx, I did see a difference in perf. The more I loop the texture sampling, the more of a perf divergence it will be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so MaybeReorderThread(materialID) gives a perf win? Does hit alone also give a perf win? Do both together win more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw MaybeReorderThread(materialID) giving a perf win. I would say hit >= (hit, materialID) > materialID. I also just tried forcing a heavier workload with math.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd stick with maybe 10 texture lookups or so if that still shows wins. 1000 is pretty unrealistic.
float2 offset = float2(i * 0.01, i * 0.01); | ||
colorSum += MaterialTexture.SampleLevel(TextureSampler, uv + offset, 0).rgb; | ||
colorSum += MaterialTexture.SampleLevel(TextureSampler, uv, 0).rgb; | ||
colorSum = sin(colorSum) + cos(colorSum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my 4090, this colorSum = sin(colorSum) + cos(colorSum); seems to hit an nvidia driver bug, crashing in driver shader compiler.
Splitting to per component solved it:
colorSum.r = sin(colorSum.r) + cos(colorSum.r);
colorSum.g = sin(colorSum.g) + cos(colorSum.g);
colorSum.b = sin(colorSum.b) + cos(colorSum.b);
Though that is a bit ugly. Could try to get NVIDIA to fix the bug, but it's probably not worth the trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's possible the compiler/driver will be smart and notice it could just sample the texture once, reusing the result.
If the intent is to sample the texture for each loop, could keep the uv offset you had before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on 4090, 10 iterations with offsetted uvs gives a win as long as the hit is part of the sort (materialID makes no difference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the 4090, do your 10 iterations include both the offset UVs and the cosine/sine calculations, or are they just for the offset UVs? (Mine still needs to include the cosine/sine iterations.)
void MyClosestHitShader(inout RayPayload payload, in MyAttributes attr) | ||
{ | ||
void ClosestHit_Complex(inout RayPayload payload, in MyAttributes attr) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the complex and simple hit shaders different in any way? I wouldn't have expected each of them to have complex vs simple branches in them. And would have expected at least some more complexity in the one labelled as "complex".
// Reorder threads based on hitobject | ||
dx::MaybeReorderThread(hit); | ||
|
||
// Reorder threads based on material ID (0 - cube, 1 - complex). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say either remove the commented shader code or if it is a different option then there should be a way to switch to it using user input.
Initial commit for basic SER scene with toggle
Implemented a scene that demonstrates the integration of Shader Execution Reordering (SER) to showcase performance gains in multi-material scenarios: