Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Commit a7d47f8

Browse files
committed
More updates to front page description of blocked/striped arrangements
Former-commit-id: afe927e
1 parent f864c1f commit a7d47f8

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

cub/cub.cuh

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -366,6 +366,8 @@
366366
* - <b><em>Blocked arrangement</em></b>. The aggregate tile of items is partitioned
367367
* evenly across threads in "blocked" fashion with thread<sub><em>i</em></sub>
368368
* owning the <em>i</em><sup>th</sup> segment of consecutive elements.
369+
* Blocked arrangements are often desirable for algorithmic benefits (where
370+
* long sequences of items can be processed sequentially within each thread).
369371
* </td>
370372
* <td>
371373
* \par
@@ -377,7 +379,10 @@
377379
* \par
378380
* - <b><em>Striped arrangement</em></b>. The aggregate tile of items is partitioned across
379381
* threads in "striped" fashion, i.e., the \p ITEMS_PER_THREAD items owned by
380-
* each thread have logical stride \p BLOCK_THREADS between them.
382+
* each thread have logical stride \p BLOCK_THREADS between them. Striped arrangements
383+
* are often desirable for data movement through global memory (where
384+
* [read/write coalescing](http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#coalesced-access-global-memory)</a>
385+
* is an important performance consideration).
381386
* </td>
382387
* <td>
383388
* \par
@@ -398,13 +403,8 @@
398403
* facilitates greater ILP for improved throughput and utilization.
399404
*
400405
* \par
401-
* Furthermore, cub::BlockExchange provides operations for converting between blocked
402-
* and striped arrangements. Blocked arrangements are often desirable for
403-
* algorithmic benefits (where long sequences of items can be processed sequentially
404-
* within each thread). Striped arrangements are often desirable for data movement
405-
* through global memory (where
406-
* [read/write coalescing](http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#coalesced-access-global-memory)</a>
407-
* is an important performance consideration).
406+
* Finally, cub::BlockExchange provides operations for converting between blocked
407+
* and striped arrangements.
408408
*
409409
* \section sec7 (7) Contributors
410410
*

0 commit comments

Comments
 (0)