This repository was archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.16.0 #434
alliepiper
announced in
Announcements
CUB 1.16.0
#434
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
CUB 1.16.0 is a major release providing several improvements to the device scope algorithms.
DeviceRadixSortnow supports large (64-bit indexed) input data. A newUniqueByKeyalgorithm has been added toDeviceSelect.DeviceAdjacentDifferenceprovides newSubtractLeftandSubtractRightfunctionality.This release also deprecates several obsolete APIs, including type traits and
BlockAdjacentDifferencealgorithms. Many bugfixes and documentation updates are also included.64-bit Offsets in
DeviceRadixSortPublic APIsUsers frequently want to process large datasets using CUB’s device-scope algorithms, but the current public APIs limit input data sizes to those that can be indexed by a 32-bit integer. Beginning with this release, CUB is updating these APIs to support 64-bit offsets, as discussed in #212.
The device-scope algorithms will be updated with 64-bit offset support incrementally, starting with the
cub::DeviceRadixSortfamily of algorithms. Thanks to @canonizer for contributing this functionality.New
DeviceSelect::UniqueByKeyAlgorithmcub::DeviceSelectnow provides aUniqueByKeyalgorithm, which has been ported from Thrust. Thanks to @zasdfgbnm for this contribution.New
DeviceAdjacentDifferenceAlgorithmsThe new
cub::DeviceAdjacentDifferenceinterface, also ported from Thrust, providesSubtractLeftandSubtractRightalgorithms as CUB kernels.Deprecation Notices
Synchronous CUDA Dynamic Parallelism Support
A future version of CUB will change the
debug_synchronousbehavior of device-scope algorithms when invoked via CUDA Dynamic Parallelism (CDP).This will only affect calls to CUB device-scope algorithms launched from device-side code with
debug_synchronous = true. Such invocations will continue to print extra debugging information, but they will no longer synchronize after kernel launches.Deprecated Traits
CUB provided a variety of metaprogramming type traits in order to support C++03. Since C++14 is now required, these traits have been deprecated in favor of their STL equivalents, as shown below:
CUB now uses the STL traits internally, resulting in a ~6% improvement in compile time.
Misnamed
cub::BlockAdjacentDifferenceAPIsThe algorithms in
cub::BlockAdjacentDifferencehave been deprecated, as their names did not clearly describe their intent. TheFlagHeadsmethod is nowSubtractLeft, andFlagTailshas been replaced bySubtractRight.Breaking Changes
BlockAdjacentDifference::FlagHeadsandFlagTailsmethods. Use the newSubtractLeftandSubtractRightmethods instead.<type_traits>as described above.New Features
thrust::adjacent_differencekernel and expose it ascub::DeviceAdjacentDifference.thrust::unique_by_keykernel and expose it ascub::DeviceSelect::UniqueByKey. Thanks to @zasdfgbmn for this contribution.Enhancements
DeviceRadixSortpublic APIs. Thanks to @canonizer for this contribution.DeviceMergeSortcompilation time.CMAKE_INSTALL_INCLUDEDIRvalues in Thrust’s CMake install rules. Thanks for @robertmaynard for this contribution.Bug Fixes
dyn_smemexample.min/maxmacros defined inwindows.h.util_device.DeviceSegmentedSort.nv_exec_check_disablepragma is only used on nvcc.-Wsizeof-array-divwarning on gcc 11. Thanks to @robertmaynard for this contribution.DiscardIteratoron gcc 10.smallmacro defined inwindows.h.DeviceSpmvparameters that are absent from public APIs.DeviceScanalgorithms that guaranteed run-to-run deterministic results for floating-point addition.This discussion was created from the release CUB 1.16.0.
Beta Was this translation helpful? Give feedback.
All reactions