Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Commit acff1a1

Browse files
authored
Merge pull request #433 from allisonvacanti/1.16.0-prep
Update CHANGELOG.md and README.md for 1.16.0.
2 parents 3691562 + 6155a90 commit acff1a1

File tree

2 files changed

+120
-2
lines changed

2 files changed

+120
-2
lines changed

CHANGELOG.md

Lines changed: 118 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,121 @@
1-
# CUB 1.15.0
1+
# CUB 1.16.0
2+
3+
## Summary
4+
5+
CUB 1.16.0 is a major release providing several improvements to the device scope
6+
algorithms. `DeviceRadixSort` now supports large (64-bit indexed) input data. A
7+
new `UniqueByKey` algorithm has been added to `DeviceSelect`.
8+
`DeviceAdjacentDifference` provides new `SubtractLeft` and `SubtractRight`
9+
functionality.
10+
11+
This release also deprecates several obsolete APIs, including type traits
12+
and `BlockAdjacentDifference` algorithms. Many bugfixes and documentation
13+
updates are also included.
14+
15+
### 64-bit Offsets in `DeviceRadixSort` Public APIs
16+
17+
Users frequently want to process large datasets using CUB’s device-scope
18+
algorithms, but the current public APIs limit input data sizes to those that can
19+
be indexed by a 32-bit integer. Beginning with this release, CUB is updating
20+
these APIs to support 64-bit offsets, as discussed in NVIDIA/cub#212.
21+
22+
The device-scope algorithms will be updated with 64-bit offset support
23+
incrementally, starting with the `cub::DeviceRadixSort` family of algorithms.
24+
Thanks to @canonizer for contributing this functionality.
25+
26+
### New `DeviceSelect::UniqueByKey` Algorithm
27+
28+
`cub::DeviceSelect` now provides a `UniqueByKey` algorithm, which has been
29+
ported from Thrust. Thanks to @zasdfgbnm for this contribution.
30+
31+
### New `DeviceAdjacentDifference` Algorithms
32+
33+
The new `cub::DeviceAdjacentDifference` interface, also ported from Thrust,
34+
provides `SubtractLeft` and `SubtractRight` algorithms as CUB kernels.
35+
36+
## Deprecation Notices
37+
38+
### Synchronous CUDA Dynamic Parallelism Support
39+
40+
**A future version of CUB will change the `debug_synchronous` behavior of
41+
device-scope algorithms when invoked via CUDA Dynamic Parallelism (CDP).**
42+
43+
This will only affect calls to CUB device-scope algorithms launched from
44+
device-side code with `debug_synchronous = true`. Such invocations will continue
45+
to print extra debugging information, but they will no longer synchronize after
46+
kernel launches.
47+
48+
### Deprecated Traits
49+
50+
CUB provided a variety of metaprogramming type traits in order to support C++03.
51+
Since C++14 is now required, these traits have been deprecated in favor of their
52+
STL equivalents, as shown below:
53+
54+
| Deprecated CUB Trait | Replacement STL Trait |
55+
|-----------------------|-----------------------|
56+
| cub::If | std::conditional |
57+
| cub::Equals | std::is_same |
58+
| cub::IsPointer | std::is_pointer |
59+
| cub::IsVolatile | std::is_volatile |
60+
| cub::RemoveQualifiers | std::remove_cv |
61+
| cub::EnableIf | std::enable_if |
62+
63+
CUB now uses the STL traits internally, resulting in a ~6% improvement in
64+
compile time.
65+
66+
### Misnamed `cub::BlockAdjacentDifference` APIs
67+
68+
The algorithms in `cub::BlockAdjacentDifference` have been deprecated, as their
69+
names did not clearly describe their intent. The `FlagHeads` method is
70+
now `SubtractLeft`, and `FlagTails` has been replaced by `SubtractRight`.
71+
72+
## Breaking Changes
73+
74+
- NVIDIA/cub#331: Deprecate the misnamed `BlockAdjacentDifference::FlagHeads`
75+
and `FlagTails` methods. Use the new `SubtractLeft` and `SubtractRight`
76+
methods instead.
77+
- NVIDIA/cub#364: Deprecate some obsolete type traits. These should be replaced
78+
by the equivalent traits in `<type_traits>` as described above.
79+
80+
## New Features
81+
82+
- NVIDIA/cub#331: Port the `thrust::adjacent_difference` kernel and expose it
83+
as `cub::DeviceAdjacentDifference`.
84+
- NVIDIA/cub#405: Port the `thrust::unique_by_key` kernel and expose it
85+
as `cub::DeviceSelect::UniqueByKey`. Thanks to @zasdfgbnm for this
86+
contribution.
87+
88+
## Enhancements
89+
90+
- NVIDIA/cub#340: Allow 64-bit offsets in `DeviceRadixSort` public APIs. Thanks
91+
to @canonizer for this contribution.
92+
- NVIDIA/cub#400: Implement a significant reduction in `DeviceMergeSort`
93+
compilation time.
94+
- NVIDIA/cub#415: Support user-defined `CMAKE_INSTALL_INCLUDEDIR` values in
95+
Thrust’s CMake install rules. Thanks for @robertmaynard for this contribution.
96+
97+
## Bug Fixes
98+
99+
- NVIDIA/cub#381: Fix shared memory alignment in `dyn_smem` example.
100+
- NVIDIA/cub#393: Fix some collisions with the `min`/`max` macros defined
101+
in `windows.h`.
102+
- NVIDIA/cub#404: Fix bad cast in `util_device`.
103+
- NVIDIA/cub#410: Fix CDP issues in `DeviceSegmentedSort`.
104+
- NVIDIA/cub#411: Ensure that the `nv_exec_check_disable` pragma is only used on
105+
nvcc.
106+
- NVIDIA/cub#418: Fix `-Wsizeof-array-div` warning on gcc 11. Thanks to
107+
@robertmaynard for this contribution.
108+
- NVIDIA/cub#420: Fix new uninitialized variable warning in `DiscardIterator` on
109+
gcc 10.
110+
- NVIDIA/cub#423: Fix some collisions with the `small` macro defined
111+
in `windows.h`.
112+
- NVIDIA/cub#426: Fix some issues with version handling in CUB’s CMake packages.
113+
- NVIDIA/cub#430: Remove documentation for `DeviceSpmv` parameters that are
114+
absent from public APIs.
115+
- NVIDIA/cub#432: Remove incorrect documentation for `DeviceScan` algorithms
116+
that guaranteed run-to-run deterministic results for floating-point addition.
117+
118+
# CUB 1.15.0 (NVIDIA HPC SDK 22.1, CUDA Toolkit 11.6)
2119

3120
## Summary
4121

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,8 @@ See the [changelog](CHANGELOG.md) for details about specific releases.
100100
101101
| CUB Release | Included In |
102102
| ------------------------- | --------------------------------------- |
103-
| 1.15.0 | TBD |
103+
| 1.16.0 | TBD |
104+
| 1.15.0 | NVIDIA HPC SDK 22.1 & CUDA Toolkit 11.6 |
104105
| 1.14.0 | NVIDIA HPC SDK 21.9 |
105106
| 1.13.1 | CUDA Toolkit 11.5 |
106107
| 1.13.0 | NVIDIA HPC SDK 21.7 |

0 commit comments

Comments
 (0)