This repository was archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.17.0
CUB 1.17.0
Summary
CUB 1.17.0 is the final minor release of the 1.X series. It provides a variety of bug fixes and miscellaneous enhancements, detailed below.
Known Issues
“Run-to-run” Determinism Broken
Several CUB device algorithms are documented to provide deterministic results (per device) for non-associative reduction operators (e.g. floating-point addition). Unfortunately, the implementations of these algorithms contain performance optimizations that violate this guarantee. The DeviceReduce::ReduceByKey and DeviceScan algorithms are known to be affected. We’re currently evaluating the scope and impact of correcting this in a future CUB release. See NVIDIA/cub#471 for details.
Bug Fixes
- #444: Fixed
DeviceSelectto work with discard iterators and mixed input/output types. - #452: Fixed install issue when
CMAKE_INSTALL_LIBDIRcontained nested directories. Thanks to @robertmaynard for this contribution. - #462: Fixed bug that produced incorrect results from
DeviceSegmentedSorton sm_61 and sm_70. - #464: Fixed
DeviceSelect::Flaggedso that flags are normalized to 0 or 1. - #468: Fixed overflow issues in
DeviceRadixSortgivennum_itemsclose to 2^32. Thanks to @canonizer for this contribution. - #498: Fixed compiler regression in
BlockAdjacentDifference. Thanks to @MKKnorr for this contribution.
Other Enhancements
- #445: Remove device-sync in
DeviceSegmentedSortwhen launched via CDP. - #449: Fixed invalid link in documentation. Thanks to @kshitij12345 for this contribution.
- #450:
BlockDiscontinuity: Replaced recursive-template loop unrolling with#pragma unroll. Thanks to @kshitij12345 for this contribution. - #451: Replaced the deprecated
TexRefInputIteratorimplementation with an alias toTexObjInputIterator. This fully removes all usages of the deprecated CUDA texture reference APIs from CUB. - #456:
BlockAdjacentDifference: Replaced recursive-template loop unrolling with#pragma unroll. Thanks to @kshitij12345 for this contribution. - #466:
cub::DeviceAdjacentDifferenceAPI has been updated to use the newOffsetTdeduction approach described in #212. - #470: Fix several doxygen-related warnings. Thanks to @karthikeyann for this contribution.