|
1 | 1 | //----------------------------------------------------------------------------- |
2 | 2 |
|
| 3 | +0.9.3 04/30/2013 |
| 4 | + |
| 5 | + - Added new BlockScan algorithm variant BLOCK_SCAN_RAKING_MEMOIZE, which |
| 6 | + trades more register consumption for less shared memory I/O) |
| 7 | + - Added block-wide histogram (BlockHisto256) |
| 8 | + - Updates to BlockRadixRank to use BlockScan (which improves performance |
| 9 | + on Kepler due to SHFL instruction) |
| 10 | + - Added device-wide histogram (DeviceHisto256) |
| 11 | + - Fixed compilation errors for some WarpScan entrypoints on SM30+ |
| 12 | + - Allow types other than C++ primitives to be used in WarpScan::*Sum methods |
| 13 | + if they only have operator + overloaded. (Previously they also required |
| 14 | + to support assignment from int(0).) |
| 15 | + - Update BlockReduce's BLOCK_REDUCE_WARP_REDUCTIONS algorithm to work even |
| 16 | + when block size is not an even multiple of warp size |
| 17 | + - Added work management utility descriptors (GridQueue, GridEvenShare) |
| 18 | + - Refactoring of DeviceAllocator interface and CachingDeviceAllocator |
| 19 | + implementation |
| 20 | + - Misc. documentation updates and corrections. |
| 21 | + |
| 22 | +//----------------------------------------------------------------------------- |
| 23 | + |
3 | 24 | 0.9.2 04/04/2013 |
4 | 25 |
|
5 | | - - Added WarpReduce. WarpReduce uses the SHFL instruction when applicable. |
6 | | - BlockReduce now uses this WarpReduce instead of implementing its own. |
7 | | - |
8 | | - - Misc. fixes for 64-bit Linux compilation warnings and errors. |
9 | | - |
10 | | - - Misc. documentation updates and corrections. |
| 26 | + - Added WarpReduce. WarpReduce uses the SHFL instruction when applicable. |
| 27 | + BlockReduce now uses this WarpReduce instead of implementing its own. |
| 28 | + - Misc. fixes for 64-bit Linux compilation warnings and errors. |
| 29 | + - Misc. documentation updates and corrections. |
11 | 30 |
|
12 | 31 | //----------------------------------------------------------------------------- |
13 | 32 |
|
14 | 33 | 0.9.1 03/09/2013 |
15 | 34 |
|
16 | | - - Fix for ambiguity in BlockScan::Reduce() between generic reduction and |
17 | | - summation. Summation entrypoints are now called ::Sum(), similar |
18 | | - to the convention in BlockScan. |
19 | | - |
20 | | - - Small edits to mainpage documentation and download tracking |
21 | | - |
| 35 | + - Fix for ambiguity in BlockScan::Reduce() between generic reduction and |
| 36 | + summation. Summation entrypoints are now called ::Sum(), similar to the |
| 37 | + convention in BlockScan. |
| 38 | + - Small edits to mainpage documentation and download tracking |
| 39 | + |
22 | 40 | //----------------------------------------------------------------------------- |
23 | 41 |
|
24 | 42 | 0.9.0 03/07/2013 |
25 | 43 |
|
26 | | - - Intial "preview" release. CUB is the first durable, high-performance library |
27 | | - of cooperative block-level, warp-level, and thread-level primitives for CUDA |
28 | | - kernel programming. More primitives and examples coming soon! |
| 44 | + - Intial "preview" release. CUB is the first durable, high-performance library |
| 45 | + of cooperative block-level, warp-level, and thread-level primitives for CUDA |
| 46 | + kernel programming. More primitives and examples coming soon! |
29 | 47 |
|
0 commit comments