|
1 | | -# CUB 1.15.0 |
| 1 | +# CUB 1.16.0 |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +CUB 1.16.0 is a major release providing several improvements to the device scope |
| 6 | +algorithms. `DeviceRadixSort` now supports large (64-bit indexed) input data. A |
| 7 | +new `UniqueByKey` algorithm has been added to `DeviceSelect`. |
| 8 | +`DeviceAdjacentDifference` provides new `SubtractLeft` and `SubtractRight` |
| 9 | +functionality. |
| 10 | + |
| 11 | +This release also deprecates several obsolete APIs, including type traits |
| 12 | +and `BlockAdjacentDifference` algorithms. Many bugfixes and documentation |
| 13 | +updates are also included. |
| 14 | + |
| 15 | +### 64-bit Offsets in `DeviceRadixSort` Public APIs |
| 16 | + |
| 17 | +Users frequently want to process large datasets using CUB’s device-scope |
| 18 | +algorithms, but the current public APIs limit input data sizes to those that can |
| 19 | +be indexed by a 32-bit integer. Beginning with this release, CUB is updating |
| 20 | +these APIs to support 64-bit offsets, as discussed in NVIDIA/cub#212. |
| 21 | + |
| 22 | +The device-scope algorithms will be updated with 64-bit offset support |
| 23 | +incrementally, starting with the `cub::DeviceRadixSort` family of algorithms. |
| 24 | +Thanks to @canonizer for contributing this functionality. |
| 25 | + |
| 26 | +### New `DeviceSelect::UniqueByKey` Algorithm |
| 27 | + |
| 28 | +`cub::DeviceSelect` now provides a `UniqueByKey` algorithm, which has been |
| 29 | +ported from Thrust. Thanks to @zasdfgbnm for this contribution. |
| 30 | + |
| 31 | +### New `DeviceAdjacentDifference` Algorithms |
| 32 | + |
| 33 | +The new `cub::DeviceAdjacentDifference` interface, also ported from Thrust, |
| 34 | +provides `SubtractLeft` and `SubtractRight` algorithms as CUB kernels. |
| 35 | + |
| 36 | +## Deprecation Notices |
| 37 | + |
| 38 | +### Synchronous CUDA Dynamic Parallelism Support |
| 39 | + |
| 40 | +**A future version of CUB will change the `debug_synchronous` behavior of |
| 41 | +device-scope algorithms when invoked via CUDA Dynamic Parallelism (CDP).** |
| 42 | + |
| 43 | +This will only affect calls to CUB device-scope algorithms launched from |
| 44 | +device-side code with `debug_synchronous = true`. Such invocations will continue |
| 45 | +to print extra debugging information, but they will no longer synchronize after |
| 46 | +kernel launches. |
| 47 | + |
| 48 | +### Deprecated Traits |
| 49 | + |
| 50 | +CUB provided a variety of metaprogramming type traits in order to support C++03. |
| 51 | +Since C++14 is now required, these traits have been deprecated in favor of their |
| 52 | +STL equivalents, as shown below: |
| 53 | + |
| 54 | +| Deprecated CUB Trait | Replacement STL Trait | |
| 55 | +|-----------------------|-----------------------| |
| 56 | +| cub::If | std::conditional | |
| 57 | +| cub::Equals | std::is_same | |
| 58 | +| cub::IsPointer | std::is_pointer | |
| 59 | +| cub::IsVolatile | std::is_volatile | |
| 60 | +| cub::RemoveQualifiers | std::remove_cv | |
| 61 | +| cub::EnableIf | std::enable_if | |
| 62 | + |
| 63 | +CUB now uses the STL traits internally, resulting in a ~6% improvement in |
| 64 | +compile time. |
| 65 | + |
| 66 | +### Misnamed `cub::BlockAdjacentDifference` APIs |
| 67 | + |
| 68 | +The algorithms in `cub::BlockAdjacentDifference` have been deprecated, as their |
| 69 | +names did not clearly describe their intent. The `FlagHeads` method is |
| 70 | +now `SubtractLeft`, and `FlagTails` has been replaced by `SubtractRight`. |
| 71 | + |
| 72 | +## Breaking Changes |
| 73 | + |
| 74 | +- NVIDIA/cub#331: Deprecate the misnamed `BlockAdjacentDifference::FlagHeads` |
| 75 | + and `FlagTails` methods. Use the new `SubtractLeft` and `SubtractRight` |
| 76 | + methods instead. |
| 77 | +- NVIDIA/cub#364: Deprecate some obsolete type traits. These should be replaced |
| 78 | + by the equivalent traits in `<type_traits>` as described above. |
| 79 | + |
| 80 | +## New Features |
| 81 | + |
| 82 | +- NVIDIA/cub#331: Port the `thrust::adjacent_difference` kernel and expose it |
| 83 | + as `cub::DeviceAdjacentDifference`. |
| 84 | +- NVIDIA/cub#405: Port the `thrust::unique_by_key` kernel and expose it |
| 85 | + as `cub::DeviceSelect::UniqueByKey`. Thanks to @zasdfgbnm for this |
| 86 | + contribution. |
| 87 | + |
| 88 | +## Enhancements |
| 89 | + |
| 90 | +- NVIDIA/cub#340: Allow 64-bit offsets in `DeviceRadixSort` public APIs. Thanks |
| 91 | + to @canonizer for this contribution. |
| 92 | +- NVIDIA/cub#400: Implement a significant reduction in `DeviceMergeSort` |
| 93 | + compilation time. |
| 94 | +- NVIDIA/cub#415: Support user-defined `CMAKE_INSTALL_INCLUDEDIR` values in |
| 95 | + Thrust’s CMake install rules. Thanks for @robertmaynard for this contribution. |
| 96 | + |
| 97 | +## Bug Fixes |
| 98 | + |
| 99 | +- NVIDIA/cub#381: Fix shared memory alignment in `dyn_smem` example. |
| 100 | +- NVIDIA/cub#393: Fix some collisions with the `min`/`max` macros defined |
| 101 | + in `windows.h`. |
| 102 | +- NVIDIA/cub#404: Fix bad cast in `util_device`. |
| 103 | +- NVIDIA/cub#410: Fix CDP issues in `DeviceSegmentedSort`. |
| 104 | +- NVIDIA/cub#411: Ensure that the `nv_exec_check_disable` pragma is only used on |
| 105 | + nvcc. |
| 106 | +- NVIDIA/cub#418: Fix `-Wsizeof-array-div` warning on gcc 11. Thanks to |
| 107 | + @robertmaynard for this contribution. |
| 108 | +- NVIDIA/cub#420: Fix new uninitialized variable warning in `DiscardIterator` on |
| 109 | + gcc 10. |
| 110 | +- NVIDIA/cub#423: Fix some collisions with the `small` macro defined |
| 111 | + in `windows.h`. |
| 112 | +- NVIDIA/cub#426: Fix some issues with version handling in CUB’s CMake packages. |
| 113 | +- NVIDIA/cub#430: Remove documentation for `DeviceSpmv` parameters that are |
| 114 | + absent from public APIs. |
| 115 | +- NVIDIA/cub#432: Remove incorrect documentation for `DeviceScan` algorithms |
| 116 | + that guaranteed run-to-run deterministic results for floating-point addition. |
| 117 | + |
| 118 | +# CUB 1.15.0 (NVIDIA HPC SDK 22.1, CUDA Toolkit 11.6) |
2 | 119 |
|
3 | 120 | ## Summary |
4 | 121 |
|
|
0 commit comments