Skip to content

Initial commit for basic SER scene with toggle #917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

chauvuha
Copy link

Initial commit for basic SER scene with toggle

Implemented a scene that demonstrates the integration of Shader Execution Reordering (SER) to showcase performance gains in multi-material scenarios:

  • Scene contains two geometry instances, each assigned a unique materialID (0 and 1) via the shader table.
  • SER is applied based on the HitObject and sortKey being materialID.
  • This scene shows a 30 - 50% improvement in FPS, depending on the number of objects and spacings between them.

uint materialID = hit.LoadLocalRootTableConstant(16);
uint hintBits = 1;

// Reorder threads based on the hit object and material ID (0 - cube, 1 - complex).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run the sample, it looks like if I just call MaybeReorderThread(hit) the perf win is ~the same is MaybeReorderThread(hit, materialID, hintBits).

And doing MaybeReorderThread(materialID, hintBits) provides no perf win (slight perf loss).

So the sort hint isn't actually helpful based on the content here - worth looking into why.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original texture load wasn’t heavy enough for the benefits of reordering to stand out. I’ve replaced it with a heavier workload to better demonstrate the divergence and performance impact for now.

Copy link
Contributor

@amarpMSFT amarpMSFT Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest version I still see the same, perf win is only coming from sorting based on the hit. e.g. MaybeReorderThread(hit) provides a win, but materialID makes no difference (whether or not combined with hit).

I'm running on a 4090. It's also possible it behaves differently on 5xxx?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Using my 5xxx, I did see a difference in perf. The more I loop the texture sampling, the more of a perf divergence it will be

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so MaybeReorderThread(materialID) gives a perf win? Does hit alone also give a perf win? Do both together win more?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw MaybeReorderThread(materialID) giving a perf win. I would say hit >= (hit, materialID) > materialID. I also just tried forcing a heavier workload with math.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick with maybe 10 texture lookups or so if that still shows wins. 1000 is pretty unrealistic.

@chauvuha chauvuha requested review from amarpMSFT and removed request for amarpMSFT June 30, 2025 16:28
@chauvuha chauvuha requested a review from amarpMSFT June 30, 2025 16:43
float2 offset = float2(i * 0.01, i * 0.01);
colorSum += MaterialTexture.SampleLevel(TextureSampler, uv + offset, 0).rgb;
colorSum += MaterialTexture.SampleLevel(TextureSampler, uv, 0).rgb;
colorSum = sin(colorSum) + cos(colorSum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my 4090, this colorSum = sin(colorSum) + cos(colorSum); seems to hit an nvidia driver bug, crashing in driver shader compiler.

Splitting to per component solved it:

        colorSum.r = sin(colorSum.r) + cos(colorSum.r);
        colorSum.g = sin(colorSum.g) + cos(colorSum.g);
        colorSum.b = sin(colorSum.b) + cos(colorSum.b);

Though that is a bit ugly. Could try to get NVIDIA to fix the bug, but it's probably not worth the trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's possible the compiler/driver will be smart and notice it could just sample the texture once, reusing the result.
If the intent is to sample the texture for each loop, could keep the uv offset you had before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least on 4090, 10 iterations with offsetted uvs gives a win as long as the hit is part of the sort (materialID makes no difference).

Copy link
Author

@chauvuha chauvuha Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the 4090, do your 10 iterations include both the offset UVs and the cosine/sine calculations, or are they just for the offset UVs? (Mine still needs to include the cosine/sine iterations.)

void MyClosestHitShader(inout RayPayload payload, in MyAttributes attr)
{
void ClosestHit_Complex(inout RayPayload payload, in MyAttributes attr)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the complex and simple hit shaders different in any way? I wouldn't have expected each of them to have complex vs simple branches in them. And would have expected at least some more complexity in the one labelled as "complex".

// Reorder threads based on hitobject
dx::MaybeReorderThread(hit);

// Reorder threads based on material ID (0 - cube, 1 - complex).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say either remove the commented shader code or if it is a different option then there should be a way to switch to it using user input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants