-
Notifications
You must be signed in to change notification settings - Fork 243
Open
Labels
good first issueGood for newcomersGood for newcomersperformanceHow fast can we go?How fast can we go?
Description
We currently maximize block utilization (taking the max threads), which may leave SMs underutilized. We should consider first selecting an optimal amount of blocks, before maximizing the thread could:
config = launch_configuration(kernel.fun)
threads = min(length(ps), config.threads)
# XXX: this kernel performs much better with all blocks active
blocks = max(cld(length(ps), threads), config.blocks)
threads = cld(length(ps), blocks)
I'm sure this will lead to some kernels performing worse, though, but it's probably a good thing to test.
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomersperformanceHow fast can we go?How fast can we go?