CUB 1.16.0 #434

alliepiper · 2022-02-08T19:34:56Z

alliepiper
Feb 8, 2022
Maintainer

Summary

CUB 1.16.0 is a major release providing several improvements to the device scope algorithms. DeviceRadixSort now supports large (64-bit indexed) input data. A new UniqueByKey algorithm has been added to DeviceSelect. DeviceAdjacentDifference provides new SubtractLeft and SubtractRight functionality.

This release also deprecates several obsolete APIs, including type traits and BlockAdjacentDifference algorithms. Many bugfixes and documentation updates are also included.

64-bit Offsets in `DeviceRadixSort` Public APIs

Users frequently want to process large datasets using CUB’s device-scope algorithms, but the current public APIs limit input data sizes to those that can be indexed by a 32-bit integer. Beginning with this release, CUB is updating these APIs to support 64-bit offsets, as discussed in #212.

The device-scope algorithms will be updated with 64-bit offset support incrementally, starting with the cub::DeviceRadixSort family of algorithms. Thanks to @canonizer for contributing this functionality.

New `DeviceSelect::UniqueByKey` Algorithm

cub::DeviceSelect now provides a UniqueByKey algorithm, which has been ported from Thrust. Thanks to @zasdfgbnm for this contribution.

New `DeviceAdjacentDifference` Algorithms

The new cub::DeviceAdjacentDifference interface, also ported from Thrust, provides SubtractLeft and SubtractRight algorithms as CUB kernels.

Deprecation Notices

Synchronous CUDA Dynamic Parallelism Support

A future version of CUB will change the debug_synchronous behavior of device-scope algorithms when invoked via CUDA Dynamic Parallelism (CDP).

This will only affect calls to CUB device-scope algorithms launched from device-side code with debug_synchronous = true. Such invocations will continue to print extra debugging information, but they will no longer synchronize after kernel launches.

Deprecated Traits

CUB provided a variety of metaprogramming type traits in order to support C++03. Since C++14 is now required, these traits have been deprecated in favor of their STL equivalents, as shown below:

Deprecated CUB Trait	Replacement STL Trait
cub::If	std::conditional
cub::Equals	std::is_same
cub::IsPointer	std::is_pointer
cub::IsVolatile	std::is_volatile
cub::RemoveQualifiers	std::remove_cv
cub::EnableIf	std::enable_if

CUB now uses the STL traits internally, resulting in a ~6% improvement in compile time.

Misnamed `cub::BlockAdjacentDifference` APIs

The algorithms in cub::BlockAdjacentDifference have been deprecated, as their names did not clearly describe their intent. The FlagHeads method is now SubtractLeft, and FlagTails has been replaced by SubtractRight.

Breaking Changes

Port adjacent difference into CUB #331: Deprecate the misnamed BlockAdjacentDifference::FlagHeads and FlagTails methods. Use the new SubtractLeft and SubtractRight methods instead.
Remove Equals and friends #364: Deprecate some obsolete type traits. These should be replaced by the equivalent traits in <type_traits> as described above.

New Features

Port adjacent difference into CUB #331: Port the thrust::adjacent_difference kernel and expose it as cub::DeviceAdjacentDifference.
Port unique by key from thrust #405: Port the thrust::unique_by_key kernel and expose it as cub::DeviceSelect::UniqueByKey. Thanks to @zasdfgbmn for this contribution.

Enhancements

64-bit Offsets in DeviceRadixSort #340: Allow 64-bit offsets in DeviceRadixSort public APIs. Thanks to @canonizer for this contribution.
Optimize compilation time for the common case #400: Implement a significant reduction in DeviceMergeSort compilation time.
Support user defined CMAKE_INSTALL_INCLUDEDIR values #415: Support user-defined CMAKE_INSTALL_INCLUDEDIR values in Thrust’s CMake install rules. Thanks for @robertmaynard for this contribution.

Bug Fixes

Fix alignment in dyn_smem example #381: Fix shared memory alignment in dyn_smem example.
Fix some min/max macro collisions with windows.h #393: Fix some collisions with the min/max macros defined in windows.h.
Fix bad cast in util_device. #404: Fix bad cast in util_device.
Fix segmented sort device-side launch #410: Fix CDP issues in DeviceSegmentedSort.
Only use nv_exec_check_disable on nvcc. #411: Ensure that the nv_exec_check_disable pragma is only used on nvcc.
Tell gcc this sizeof division is intended ( -Wsizeof-array-div ) #418: Fix -Wsizeof-array-div warning on gcc 11. Thanks to @robertmaynard for this contribution.
Initialize padding member to silence warning. #420: Fix new uninitialized variable warning in DiscardIterator on gcc 10.
Fix segmented sort compilation in case of windows.h #423: Fix some collisions with the small macro defined in windows.h.
Fix version checks in CMake packages. #426: Fix some issues with version handling in CUB’s CMake packages.
Remove references to alpha/beta from DeviceSpmv docs. #430: Remove documentation for DeviceSpmv parameters that are absent from public APIs.
Remove run-to-run determinism guarantee from the scan documentation #432: Remove incorrect documentation for DeviceScan algorithms that guaranteed run-to-run deterministic results for floating-point addition.

This discussion was created from the release CUB 1.16.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUB 1.16.0 #434

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

CUB 1.16.0 #434

Uh oh!

alliepiper Feb 8, 2022 Maintainer

Summary

64-bit Offsets in DeviceRadixSort Public APIs

New DeviceSelect::UniqueByKey Algorithm

New DeviceAdjacentDifference Algorithms

Deprecation Notices

Synchronous CUDA Dynamic Parallelism Support

Deprecated Traits

Misnamed cub::BlockAdjacentDifference APIs

Breaking Changes

New Features

Enhancements

Bug Fixes

Replies: 0 comments

alliepiper
Feb 8, 2022
Maintainer

64-bit Offsets in `DeviceRadixSort` Public APIs

New `DeviceSelect::UniqueByKey` Algorithm

New `DeviceAdjacentDifference` Algorithms

Misnamed `cub::BlockAdjacentDifference` APIs