Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Commit 48b5558

Browse files
committed
Add 2.0.1 and 2.1.0 changelogs.
1 parent b4f7e23 commit 48b5558

File tree

1 file changed

+61
-8
lines changed

1 file changed

+61
-8
lines changed

CHANGELOG.md

Lines changed: 61 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,56 @@
1+
# CUB 2.1.0
2+
3+
## Breaking Changes
4+
5+
- NVIDIA/cub#553: Deprecate the `CUB_USE_COOPERATIVE_GROUPS` macro, as all supported CTK
6+
distributions provide CG. This macro will be removed in a future version of CUB.
7+
8+
## New Features
9+
10+
- NVIDIA/cub#359: Add new `DeviceBatchMemcpy` algorithm.
11+
- NVIDIA/cub#565: Add `DeviceMergeSort::StableSortKeysCopy` API. Thanks to David Wendt (@davidwendt)
12+
for this contribution.
13+
- NVIDIA/cub#585: Add SM90 tuning policy for `DeviceRadixSort`. Thanks to Andy Adinets (@canonizer)
14+
for this contribution.
15+
- NVIDIA/cub#586: Introduce a new mechanism to opt-out of compiling CDP support in CUB algorithms by
16+
defining `CUB_DISABLE_CDP`.
17+
- NVIDIA/cub#589: Support 64-bit indexing in `DeviceReduce`.
18+
- NVIDIA/cub#607: Support 128-bit integers in radix sort.
19+
20+
## Bug Fixes
21+
22+
- NVIDIA/cub#547: Resolve several long-running issues resulting from using multiple versions of CUB
23+
within the same process. Adds an inline namespace that encodes CUB version and targeted PTX
24+
architectures.
25+
- NVIDIA/cub#562: Fix bug in `BlockShuffle` resulting from an invalid thread offset. Thanks to
26+
@sjfeng1999 for this contribution.
27+
- NVIDIA/cub#564: Fix bug in `BlockRadixRank` when used with blocks that are not a multiple of 32
28+
threads.
29+
- NVIDIA/cub#579: Ensure that all threads in the logical warp participate in the index-shuffle
30+
for `BlockRadixRank`. Thanks to Andy Adinets (@canonizer) for this contribution.
31+
- NVIDIA/cub#582: Fix reordering in CUB member initializer lists.
32+
- NVIDIA/cub#589: Fix `DeviceSegmentedSort` when used with `bool` keys.
33+
- NVIDIA/cub#590: Fix CUB's CMake install rules. Thanks to Robert Maynard (@robertmaynard) for this
34+
contribution.
35+
- NVIDIA/cub#592: Fix overflow in `DeviceReduce`.
36+
- NVIDIA/cub#598: Fix `DeviceRunLengthEncode` when the first item is a `NaN`.
37+
- NVIDIA/cub#611: Fix `WarpScanExclusive` for vector types.
38+
39+
## Other Enhancements
40+
41+
- NVIDIA/cub#537: Add detailed and expanded version of
42+
a [CUB developer overview](https://github.com/NVIDIA/cub/blob/main/docs/developer_overview.md).
43+
- NVIDIA/cub#549: Fix `BlockReduceRaking` docs for non-commutative operations. Thanks to Tobias
44+
Ribizel (@upsj) for this contribution.
45+
- NVIDIA/cub#606: Optimize CUB's decoupled-lookback implementation.
46+
47+
# CUB 2.0.1
48+
49+
## Other Enhancements
50+
51+
- Skip device-side synchronization on SM90+. These syncs are a debugging-only feature and not
52+
required for correctness, and a warning will be emitted if this happens.
53+
154
# CUB 2.0.0
255

356
## Summary
@@ -28,7 +81,7 @@ clarifying which operations can and cannot be performed in-place.
2881
- `CUB_IS_DEVICE_CODE`: Replace with `NV_IF_TARGET`.
2982
- `CUB_INCLUDE_HOST_CODE`: Replace with `NV_IF_TARGET`.
3083
- `CUB_INCLUDE_DEVICE_CODE`: Replace with `NV_IF_TARGET`.
31-
- NVIDIA/cub#486: CUBs CUDA Runtime support macros have been updated to
84+
- NVIDIA/cub#486: CUB's CUDA Runtime support macros have been updated to
3285
support `NV_IF_TARGET`. They are now defined consistently across all
3386
host/device compilation passes. This should not affect most usages of these
3487
macros, but may require changes for some edge cases.
@@ -53,7 +106,7 @@ clarifying which operations can and cannot be performed in-place.
53106
- RDC enabled: Macro is defined.
54107
- RDC not enabled: Macro is not defined.
55108
- NVIDIA/cub#509: A compile-time error is now emitted when a `__device__`-only
56-
lambdas return type is queried from host code (requires libcu++ ≥ 1.9.0).
109+
lambda's return type is queried from host code (requires libcu++ ≥ 1.9.0).
57110
- Due to limitations in the CUDA programming model, the result of this query
58111
is unreliable, and will silently return an incorrect result. This leads to
59112
difficult to debug errors.
@@ -66,7 +119,7 @@ clarifying which operations can and cannot be performed in-place.
66119
- NVIDIA/cub#509: Use the result type of the binary reduction operator for
67120
accumulating intermediate results in the `DeviceReduce` algorithm, following
68121
guidance from http://wg21.link/P2322R6.
69-
- This change requires host-side introspection of the binary operators
122+
- This change requires host-side introspection of the binary operator's
70123
signature, and device-only extended lambda functions can no longer be used.
71124
- In addition to the behavioral changes, the interfaces for
72125
the `Dispatch*Reduce` layer have changed:
@@ -87,7 +140,7 @@ clarifying which operations can and cannot be performed in-place.
87140
intermediate results in the `DeviceScan`, `DeviceScanByKey`,
88141
and `DeviceReduceByKey` algorithms, following guidance
89142
from http://wg21.link/P2322R6.
90-
- This change requires host-side introspection of the binary operators
143+
- This change requires host-side introspection of the binary operator's
91144
signature, and device-only extended lambda functions can no longer be used.
92145
- In addition to the behavioral changes, the interfaces for the `Dispatch`
93146
layer have changed:
@@ -190,7 +243,7 @@ Several CUB device algorithms are documented to provide deterministic results
190243
addition). Unfortunately, the implementations of these algorithms contain
191244
performance optimizations that violate this guarantee.
192245
The `DeviceReduce::ReduceByKey` and `DeviceScan` algorithms are known to be
193-
affected. Were currently evaluating the scope and impact of correcting this in
246+
affected. We're currently evaluating the scope and impact of correcting this in
194247
a future CUB release. See NVIDIA/cub#471 for details.
195248

196249
## Bug Fixes
@@ -244,7 +297,7 @@ updates are also included.
244297

245298
### 64-bit Offsets in `DeviceRadixSort` Public APIs
246299

247-
Users frequently want to process large datasets using CUBs device-scope
300+
Users frequently want to process large datasets using CUB's device-scope
248301
algorithms, but the current public APIs limit input data sizes to those that can
249302
be indexed by a 32-bit integer. Beginning with this release, CUB is updating
250303
these APIs to support 64-bit offsets, as discussed in NVIDIA/cub#212.
@@ -322,7 +375,7 @@ now `SubtractLeft`, and `FlagTails` has been replaced by `SubtractRight`.
322375
- NVIDIA/cub#400: Implement a significant reduction in `DeviceMergeSort`
323376
compilation time.
324377
- NVIDIA/cub#415: Support user-defined `CMAKE_INSTALL_INCLUDEDIR` values in
325-
Thrusts CMake install rules. Thanks for @robertmaynard for this contribution.
378+
Thrust's CMake install rules. Thanks for @robertmaynard for this contribution.
326379

327380
## Bug Fixes
328381

@@ -339,7 +392,7 @@ now `SubtractLeft`, and `FlagTails` has been replaced by `SubtractRight`.
339392
gcc 10.
340393
- NVIDIA/cub#423: Fix some collisions with the `small` macro defined
341394
in `windows.h`.
342-
- NVIDIA/cub#426: Fix some issues with version handling in CUBs CMake packages.
395+
- NVIDIA/cub#426: Fix some issues with version handling in CUB's CMake packages.
343396
- NVIDIA/cub#430: Remove documentation for `DeviceSpmv` parameters that are
344397
absent from public APIs.
345398
- NVIDIA/cub#432: Remove incorrect documentation for `DeviceScan` algorithms

0 commit comments

Comments
 (0)