Skip to content

Conversation

goeiecool9999
Copy link
Collaborator

@goeiecool9999 goeiecool9999 commented Jul 24, 2025

OpenGL parallel shader compilation fix

Move finishing shader compilation to WaitForCompiled.
If my understanding is correct ARB_parallel_shader_compile makes glCompileShader non-blocking.
However before this change cemu would immediately call glGetShaderiv GL_COMPILE_STATUS to check for compilation errors which then forces the shader to be done compiling. That means the blocking compilation operation is just moved down a few lines and no parallelism can take place.
I have changed it so the checking doesn't happen until WaitForCompiled. That means that multiple glCompileShader calls can happen before cemu runs a blocking glGetShaderiv call.
While I have not observed substantial speedups on my driver I think my changes are theoretically correct.
I have also changed IsCompiled so it returns the extensions GL_COMPLETION_STATUS_ARB which allows cemu to query compilation status in the same way as with vulkan shaders.
I used this change in the shader cache compilation queue logic so that it only forces the first entry to compile when the queue is full and there are no already compiled shaders. This potentially improves performance in a scenario where the queue is full and the first entry is slower to compile than shaders later in the queue, because it allows the queue to be filled immediately rather than be drained while the slow shader finishes.

misc.

  • Delete shader object after program linkage. This saves gigabytes of memory on my driver when a whole shader cache is compiled without a driver cache.
  • Removed unused isRenderThread argument from PreponeCompilation
  • Removed redundant LatteShader_FinishCompilation
  • use unique_ptr for programBinaryCache

@Exzap
Copy link
Member

Exzap commented Jul 24, 2025

I think Mesa's is the only implementation that actually compiles shaders in glCompileShader, all others delay it to glLinkProgram. And from what I can see thats still blocking because it retrieves the link status right afterwards.

There are also issues with GL_COMPLETION_STATUS_ARB having broken behavior on Nvidia
It's probably fine on their current drivers but considering that OpenGL is there for backwards compatibility there are likely users that are stuck on affected drivers.

@goeiecool9999
Copy link
Collaborator Author

goeiecool9999 commented Jul 24, 2025

And from what I can see thats still blocking because it retrieves the link status right afterwards.

I guess it didn't click that you can call glLinkProgram before checking if the shaders compiled successfully. I've changed it now so it issues both compilation and linking immediately and checks both in IsCompiled().

In that thread Nvidia keeps insisting that their drivers are fine and I think I agree with their reasoning. The fact that querying shader completion status always returns 1 and is a blocking operation is not an issue when they say that glCompileShader is expected to be synchronous, and to take a very short amount of time. If all the compilation happens during the linking that'll work too, now that I've changed the code.

@goeiecool9999
Copy link
Collaborator Author

goeiecool9999 commented Jul 24, 2025

In the initial post I was wrong and you're right @Exzap. As long as glCompileShader is light (just parsing for syntax errors) and the intensive compilation is done in glLinkProgram then the current implementation would be fine if it weren't for storeBinary also being called immediately in the constructor, which forces synchronization on glLinkProgram.
If you know of a driver where GL_COMPLETION_STATUS_ARB is actually unreliable (like never becoming 1 or something) then I can remove it. The annoying thing is that the looking for finished compilations down the queue optimisation will then be a vulkan-specific case and then I'd probably just remove it altogether since 2 threads can easily stay fed with a 32 job queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants