small tensor parallelism optimization #796

Alex-GZ · 2025-06-14T09:44:12Z

Just two small changes in tensor parallelism:

Removed an unnecessary (in my opinion) cross gpu barrier in tp_broadcast, so the GPU that had finished the transfer earlier can start its work sooner. Maybe instead, there should be a synchronization done before the CPU->GPU transfer, in the 'src_dev>=0' branch.
Rewrote the barrier to make it cost 2n-2 event wait calls instead of n(n-1), while preserving its function.

This PR had improved generation T/s in my setup by about 20% in the zone of CUDA API queueing bottleneck (at <20k context, 123B 4.5bpw model, 6 mostly different GPU, CPU with rather slow single-threaded performance and Linux) The impact may be a bit higher on Windows, probably.

simpler cross gpu-barrier

Ph0rk0z · 2025-06-21T14:21:18Z

Tested it, and appears to work. Splitting 123b over 4x3090 I go from high 16/17t/s to 19.x t/s. Went back and forth a few times. Almost as good as splitting over 3 only. Should try that next and see if I get a speedup too.

Ph0rk0z · 2025-06-21T15:56:34Z

Some more tests. 4gpu, there is gain, 3 gpu there is about a .5t/s drop.

Can you check on your setup for odd/even gpu? Maybe there is a way to use both ways depending on device count.

removed potentially unnecessary sync in tp_broadcast

b736e28

simpler cross gpu-barrier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

small tensor parallelism optimization #796

small tensor parallelism optimization #796

Uh oh!

Alex-GZ commented Jun 14, 2025

Uh oh!

Ph0rk0z commented Jun 21, 2025

Uh oh!

Ph0rk0z commented Jun 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

small tensor parallelism optimization #796

Are you sure you want to change the base?

small tensor parallelism optimization #796

Uh oh!

Conversation

Alex-GZ commented Jun 14, 2025

Uh oh!

Ph0rk0z commented Jun 21, 2025

Uh oh!

Ph0rk0z commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Ph0rk0z commented Jun 21, 2025 •

edited

Loading