Skip to content

Commit 7aa0c00

Browse files
[rocm-libraries] ROCm/rocm-libraries#1036 (commit 539ba71)
[rocPRIM][Code Coverage][Cherry Pick] Increase code coverage for rocPRIM (#1036) In this commit we have added test coverage, made some bug fixes and added a wrapper function for a common pattern in the tests. This `test_kernel_wrapper` function reduces the amount of code used. In ROCm/rocm-libraries@0318f42 we have added coverage of the `warp_scan_shuffle` by forcing dpp off in cmake on the `warp_scan` test. In ROCm/rocm-libraries@30d54c1 we have added the missing test cases for the thread_algos in the thread directory. (Except for thread_operators, these functions are mostly duplicates probably needs to be cleanly moved/deprecated). We also found a bug in `thread_reduce` with the tests and fixed this. In ROCm/rocm-libraries@48780ff ROCm/rocm-libraries@37aa542 ROCm/rocm-libraries@05a9e12 ROCm/rocm-libraries@f5f1dbe ROCm/rocm-libraries@343e275 ROCm/rocm-libraries@8007fe6 we have added unit tests for the files in the types directory. In ROCm/rocm-libraries@40ca3e2 we have introduced the `test_kernel_wrapper` and added the missing `PartitionTwoWayFlag` test. In ROCm/rocm-libraries@7a4e7ba unit tests for the rocprim::tuple type are added. In ROCm/rocm-libraries@29eac9a we have added extra test coverage by introducing a new type that will go into the untested path. We have also added this type to the benchmark to show [this specialization](ROCm/rocm-libraries@29eac9a#diff-22a70b2ad081732e222004ded43ce0db7145ff196f7b723289425dcb5b6c732dR228) is still needed. We also changed the `std::is_integral` to `rocprim::is_integral` to not include `(u)int128_t` for this specialization, this does not impact performance. The specialization enable_if was also slightly changed to make it clearer which path is chosen (does not make a difference in actual executed code). Also custom config and a iterator was added to the tests of merge_sort. In ROCm/rocm-libraries@f5614d5 test coverage was added for the device_scan_common.hpp file. In ROCm/rocm-libraries@1d2a40a test coverage was increased by actually using `const fixed_array` and including a `Level` type which goes into the base path of sample_to_bin_even struct. We also added an iterator as a type to the test. Also the new test wrapper was used. In ROCm/rocm-libraries@c518f42 test coverage was increased for the iterators, a lot of the operators where missing. Also some cleaning up of the tests was done including the wrapper when possible. There was also a bug found for the comperator in `arg_index_iterator` and `texture_cache_iterator`, which was also fixed. The `->` operator is not tested for `test_texture_cache`, `arg_index_iterator`, `transform_iterator` and `zip_iterator`. They currently do not seem to work, I am working on a possible fix but will be added in a later PR.
1 parent 361cf61 commit 7aa0c00

34 files changed

+2901
-1501
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ when the input or inital type was smaller than the output type.
9797
* Fixed an issue where `device_segmented_reduce` reported autotuning throughput being 5x lower than it was in reality.
9898
* Fixed device radix sort not returning the correct required temporary storage when a double buffer contains `nullptr`.
9999
* Fixed constness of equality operators (`==` and `!=`) in `rocprim::key_value_pair`.
100+
* Fixed an issue for the comparison operators in `arg_index_iterator` and `texture_cache_iterator`, where `<` and `>` comparators were swapped.
101+
* Fixed an issue for the `rocprim::thread_reduce` not working correctly with a prefix value.
100102

101103
### Known issues
102104
* When using `rocprim::deterministic_inclusive_scan_by_key` and `rocprim::deterministic_exclusive_scan_by_key` the intermediate values can change order on Navi3x

benchmark/benchmark_device_merge_sort.cpp

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,16 @@ int main(int argc, char* argv[])
5050
CREATE_BENCHMARK(rocprim::int128_t)
5151
CREATE_BENCHMARK(rocprim::uint128_t)
5252

53-
using custom_float2 = common::custom_type<float, float>;
54-
using custom_double2 = common::custom_type<double, double>;
55-
using custom_int2 = common::custom_type<int, int>;
56-
using custom_char_double = common::custom_type<char, double>; // used by ssbk benchmark
53+
using custom_float2 = common::custom_type<float, float>;
54+
using custom_double2 = common::custom_type<double, double>;
55+
using custom_double2_copy
56+
= common::custom_type_copyable<double,
57+
double>; // specific benchmark for specialization workaround
58+
using custom_int2 = common::custom_type<int, int>;
59+
using custom_char_double = common::custom_type<char, double>; // used by ssbk
60+
using custom_char_double_copy
61+
= common::custom_type_copyable<char,
62+
double>; // specific benchmark for specialization workaround
5763
using custom_longlong_double = common::custom_type<long long, double>;
5864
using huge_float2_1024 = common::custom_huge_type<1024, float, float>;
5965
using huge_float2_2048 = common::custom_huge_type<2048, float, float>;
@@ -69,8 +75,10 @@ int main(int argc, char* argv[])
6975
CREATE_BENCHMARK(huge_float2_2048)
7076
CREATE_BENCHMARK(long long, custom_double2)
7177
CREATE_BENCHMARK(custom_double2, custom_double2)
78+
CREATE_BENCHMARK(custom_double2, custom_double2_copy)
7279
CREATE_BENCHMARK(custom_int2, custom_double2)
7380
CREATE_BENCHMARK(custom_int2, custom_char_double)
81+
CREATE_BENCHMARK(custom_int2, custom_char_double_copy)
7482
CREATE_BENCHMARK(custom_int2, custom_longlong_double)
7583
CREATE_BENCHMARK(rocprim::int128_t, rocprim::int128_t)
7684
CREATE_BENCHMARK(rocprim::uint128_t, rocprim::uint128_t)

benchmark/benchmark_device_merge_sort.hpp

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,21 @@ struct device_merge_sort_benchmark : public benchmark_utils::autotune_interface
140140
seed.get_0());
141141

142142
std::vector<value_type> values_input(size);
143-
std::iota(values_input.begin(), values_input.end(), 0);
143+
144+
if constexpr(common::is_custom_type_copyable<value_type>::value)
145+
{
146+
for(size_t i = 0; i < size; i++)
147+
{
148+
value_type value;
149+
value.x = static_cast<typename value_type::first_type>(i);
150+
value.y = static_cast<typename value_type::second_type>(i);
151+
values_input[i] = value;
152+
}
153+
}
154+
else
155+
{
156+
std::iota(values_input.begin(), values_input.end(), 0);
157+
}
144158

145159
key_type* d_keys_input;
146160
key_type* d_keys_output;

benchmark/benchmark_utils.hpp

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -694,8 +694,8 @@ struct bench_naming
694694
static std::string format_name(std::string string)
695695
{
696696
format format = get_format();
697-
std::regex r(
698-
"([A-z0-9]*):\\s*((?:common::custom_type<[A-z0-9,]*>)|[A-z:\\(\\)\\.<>\\s0-9]*)(\\}*)");
697+
std::regex r("([A-z0-9]*):\\s*((?:common::custom_type<[A-z0-9,]*>)|(?:common::custom_type_"
698+
"copyable<[A-z0-9,]*>)|[A-z:\\(\\)\\.<>\\s0-9]*)(\\}*)");
699699
// First we perform some checks
700700
bool checks[4] = {false};
701701
for(std::sregex_iterator i = std::sregex_iterator(string.begin(), string.end(), r);
@@ -917,6 +917,16 @@ inline const char* Traits<rocprim::uint128_t>::name()
917917
{
918918
return "rocprim::uint128_t";
919919
}
920+
template<>
921+
inline const char* Traits<common::custom_type_copyable<char, double>>::name()
922+
{
923+
return "common::custom_type_copyable<char,double>";
924+
}
925+
template<>
926+
inline const char* Traits<common::custom_type_copyable<double, double>>::name()
927+
{
928+
return "common::custom_type_copyable<double,double>";
929+
}
920930

921931
inline const char* get_block_scan_algorithm_name(rocprim::block_scan_algorithm alg)
922932
{

common/utils_custom_type.hpp

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,85 @@ struct custom_type
138138
}
139139
};
140140

141+
template<typename T, typename U = T, bool NonZero = false>
142+
struct custom_type_copyable
143+
{
144+
using first_type = T;
145+
using second_type = U;
146+
147+
using value_type = std::conditional_t<std::is_same<T, U>::value, T, void>;
148+
149+
T x;
150+
U y;
151+
152+
ROCPRIM_HOST_DEVICE constexpr inline custom_type_copyable()
153+
: x(NonZero ? 12 : 0), y(NonZero ? 34 : 0)
154+
{}
155+
156+
ROCPRIM_HOST_DEVICE inline custom_type_copyable(T x, U y) : x(x), y(y) {}
157+
158+
ROCPRIM_HOST_DEVICE inline custom_type_copyable(T xy) : x(xy), y(xy) {}
159+
160+
template<typename V, typename W>
161+
ROCPRIM_HOST_DEVICE inline custom_type_copyable(
162+
const custom_type_copyable<V, W, NonZero>& other)
163+
: x(static_cast<T>(other.x)), y(static_cast<U>(other.y))
164+
{}
165+
166+
ROCPRIM_HOST_DEVICE
167+
inline bool
168+
operator<(const custom_type_copyable& other) const
169+
{
170+
rocprim::less<T> less_T;
171+
rocprim::equal_to<T> equal_to_T;
172+
rocprim::less<U> less_U;
173+
return (less_T(x, other.x) || (equal_to_T(x, other.x) && less_U(y, other.y)));
174+
}
175+
176+
ROCPRIM_HOST_DEVICE
177+
inline bool
178+
operator>(const custom_type_copyable& other) const
179+
{
180+
rocprim::greater<T> greater_T;
181+
rocprim::equal_to<T> equal_to_T;
182+
rocprim::greater<U> greater_U;
183+
return (greater_T(x, other.x) || (equal_to_T(x, other.x) && greater_U(y, other.y)));
184+
}
185+
186+
ROCPRIM_HOST_DEVICE
187+
inline bool
188+
operator==(const custom_type_copyable& other) const
189+
{
190+
rocprim::equal_to<T> equal_to_T;
191+
rocprim::equal_to<U> equal_to_U;
192+
return (equal_to_T(x, other.x) && equal_to_U(y, other.y));
193+
}
194+
195+
ROCPRIM_HOST_DEVICE
196+
inline bool
197+
operator!=(const custom_type_copyable& other) const
198+
{
199+
return !(*this == other);
200+
}
201+
202+
friend inline std::ostream& operator<<(std::ostream& stream, const custom_type_copyable& value)
203+
{
204+
stream << "[" << value.x << "; " << value.y << "]";
205+
return stream;
206+
}
207+
};
208+
209+
static_assert(std::is_trivially_copyable<custom_type_copyable<char, double>>::value,
210+
"custom_type_copyable is not trivially copyable");
211+
212+
template<typename T>
213+
struct is_custom_type_copyable : std::false_type
214+
{};
215+
216+
template<typename T, typename U, bool NonZero>
217+
struct is_custom_type_copyable<common::custom_type_copyable<T, U, NonZero>> : std::true_type
218+
{};
219+
141220
template<unsigned int Size, class T, typename U = T, bool NonZero = false>
142221
struct custom_huge_type : custom_type<T, U, NonZero>
143222
{
@@ -182,6 +261,10 @@ struct is_custom_type<common::custom_type<T, U, NonZero>> : std::true_type
182261
template<unsigned int Size, typename T, typename U, bool NonZero>
183262
struct is_custom_type<common::custom_huge_type<Size, T, U, NonZero>> : std::true_type
184263
{};
264+
265+
template<typename T, typename U, bool NonZero>
266+
struct is_custom_type<common::custom_type_copyable<T, U, NonZero>> : std::true_type
267+
{};
185268
} // namespace common
186269

187270
#endif // COMMON_UTILS_CUSTOM_TYPE_HPP_

rocprim/include/rocprim/config.hpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,11 +150,13 @@
150150
// Only defined when support is present, in contrast to ROCPRIM_DETAIL_USE_DPP, which should be
151151
// always defined
152152
#if defined(__HIP_DEVICE_COMPILE__) && defined(__AMDGCN__) \
153-
&& (!defined(__GFX6__) && !defined(__GFX7__))
153+
&& (!defined(__GFX6__) && !defined(__GFX7__)) \
154+
&& !(defined(ROCPRIM_TARGET_SPIRV) && ROCPRIM_TARGET_SPIRV == 1)
154155
#define ROCPRIM_DETAIL_HAS_DPP 1
155156
#endif
156157

157-
#if !defined(ROCPRIM_DISABLE_DPP) && defined(ROCPRIM_DETAIL_HAS_DPP) && !ROCPRIM_TARGET_SPIRV
158+
#if(!defined(ROCPRIM_DISABLE_DPP) || ROCPRIM_DISABLE_DPP == 0) \
159+
&& (defined(ROCPRIM_DETAIL_HAS_DPP) && ROCPRIM_DETAIL_HAS_DPP == 1)
158160
#define ROCPRIM_DETAIL_USE_DPP 1
159161
#else
160162
#define ROCPRIM_DETAIL_USE_DPP 0

rocprim/include/rocprim/device/detail/device_histogram.hpp

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -306,8 +306,9 @@ ROCPRIM_DEVICE ROCPRIM_INLINE void
306306
}
307307

308308
template<unsigned int BlockSize, unsigned int ActiveChannels, class Counter>
309-
ROCPRIM_DEVICE ROCPRIM_INLINE void init_histogram(fixed_array<Counter*, ActiveChannels> histogram,
310-
fixed_array<unsigned int, ActiveChannels> bins)
309+
ROCPRIM_DEVICE ROCPRIM_INLINE
310+
void init_histogram(fixed_array<Counter*, ActiveChannels> histogram,
311+
const fixed_array<unsigned int, ActiveChannels> bins)
311312
{
312313
const unsigned int flat_id = ::rocprim::detail::block_thread_id<0>();
313314
const unsigned int block_id = ::rocprim::detail::block_id<0>();
@@ -329,17 +330,17 @@ template<unsigned int BlockSize,
329330
class SampleIterator,
330331
class Counter,
331332
class SampleToBinOp>
332-
ROCPRIM_DEVICE ROCPRIM_INLINE void
333-
histogram_shared(SampleIterator samples,
334-
unsigned int columns,
335-
unsigned int rows,
336-
unsigned int row_stride,
337-
unsigned int rows_per_block,
338-
unsigned int shared_histograms,
339-
fixed_array<Counter*, ActiveChannels> histogram,
340-
fixed_array<SampleToBinOp, ActiveChannels> sample_to_bin_op,
341-
fixed_array<unsigned int, ActiveChannels> bins,
342-
unsigned int* block_histogram_start)
333+
ROCPRIM_DEVICE ROCPRIM_INLINE
334+
void histogram_shared(SampleIterator samples,
335+
unsigned int columns,
336+
unsigned int rows,
337+
unsigned int row_stride,
338+
unsigned int rows_per_block,
339+
unsigned int shared_histograms,
340+
fixed_array<Counter*, ActiveChannels> histogram,
341+
const fixed_array<SampleToBinOp, ActiveChannels> sample_to_bin_op,
342+
const fixed_array<unsigned int, ActiveChannels> bins,
343+
unsigned int* block_histogram_start)
343344
{
344345
using sample_type = typename std::iterator_traits<SampleIterator>::value_type;
345346
using sample_vector_type = sample_vector<sample_type, Channels>;
@@ -456,13 +457,13 @@ template<unsigned int BlockSize,
456457
class SampleIterator,
457458
class Counter,
458459
class SampleToBinOp>
459-
ROCPRIM_DEVICE ROCPRIM_INLINE void
460-
histogram_global(SampleIterator samples,
461-
unsigned int columns,
462-
unsigned int row_stride,
463-
fixed_array<Counter*, ActiveChannels> histogram,
464-
fixed_array<SampleToBinOp, ActiveChannels> sample_to_bin_op,
465-
fixed_array<unsigned int, ActiveChannels> bins_bits)
460+
ROCPRIM_DEVICE ROCPRIM_INLINE
461+
void histogram_global(SampleIterator samples,
462+
unsigned int columns,
463+
unsigned int row_stride,
464+
fixed_array<Counter*, ActiveChannels> histogram,
465+
const fixed_array<SampleToBinOp, ActiveChannels> sample_to_bin_op,
466+
const fixed_array<unsigned int, ActiveChannels> bins_bits)
466467
{
467468
using sample_type = typename std::iterator_traits<SampleIterator>::value_type;
468469
using sample_vector_type = sample_vector<sample_type, Channels>;

rocprim/include/rocprim/device/detail/device_merge_sort.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ struct block_permute_values_impl<Value,
230230
ItemsPerThread,
231231
std::enable_if_t<(std::is_trivially_copyable<Value>::value
232232
&& !rocprim::is_floating_point<Value>::value
233-
&& !std::is_integral<Value>::value)>>
233+
&& !rocprim::is_integral<Value>::value)>>
234234
{
235235
static constexpr unsigned int items_per_block = ItemsPerThread * BlockSize;
236236

rocprim/include/rocprim/device/detail/device_merge_sort_mergepath.hpp

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -98,8 +98,9 @@ struct block_merge_impl<
9898
Value,
9999
BlockSize,
100100
ItemsPerThread,
101-
std::enable_if_t<!std::is_trivially_copyable<Value>::value
102-
|| rocprim::is_floating_point<Value>::value || std::is_integral<Value>::value>>
101+
std::enable_if_t<
102+
!std::is_trivially_copyable<Value>::value || rocprim::is_floating_point<Value>::value
103+
|| rocprim::is_integral<Value>::value || std::is_same<Value, ::rocprim::empty_type>::value>>
103104
{
104105

105106
static constexpr bool with_values = !std::is_same<Value, ::rocprim::empty_type>::value;
@@ -264,9 +265,9 @@ struct block_merge_impl<Key,
264265
ItemsPerThread,
265266
std::enable_if_t<std::is_trivially_copyable<Value>::value
266267
&& !rocprim::is_floating_point<Value>::value
267-
&& !std::is_integral<Value>::value>>
268+
&& !rocprim::is_integral<Value>::value
269+
&& !std::is_same<Value, ::rocprim::empty_type>::value>>
268270
{
269-
270271
static constexpr bool with_values = !std::is_same<Value, ::rocprim::empty_type>::value;
271272
static constexpr unsigned int items_per_tile = BlockSize * ItemsPerThread;
272273

@@ -301,7 +302,6 @@ struct block_merge_impl<Key,
301302
const OffsetT* merge_partitions,
302303
storage_type& storage)
303304
{
304-
305305
auto& keys_shared = storage.keys.get();
306306
auto& values_shared = storage.values.get();
307307

rocprim/include/rocprim/device/device_histogram.hpp

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,8 @@ namespace detail
4444

4545
template<class Config, unsigned int ActiveChannels, class Counter>
4646
ROCPRIM_KERNEL ROCPRIM_LAUNCH_BOUNDS(device_params<Config>().histogram_config.block_size) void
47-
init_histogram_kernel(fixed_array<Counter*, ActiveChannels> histogram,
48-
fixed_array<unsigned int, ActiveChannels> bins)
47+
init_histogram_kernel(fixed_array<Counter*, ActiveChannels> histogram,
48+
const fixed_array<unsigned int, ActiveChannels> bins)
4949
{
5050
static constexpr histogram_config_params params = device_params<Config>();
5151

@@ -59,15 +59,15 @@ template<class Config,
5959
class Counter,
6060
class SampleToBinOp>
6161
ROCPRIM_KERNEL ROCPRIM_LAUNCH_BOUNDS(device_params<Config>().histogram_config.block_size) void
62-
histogram_shared_kernel(SampleIterator samples,
63-
unsigned int columns,
64-
unsigned int rows,
65-
unsigned int row_stride,
66-
unsigned int rows_per_block,
67-
unsigned int shared_histograms,
68-
fixed_array<Counter*, ActiveChannels> histogram,
69-
fixed_array<SampleToBinOp, ActiveChannels> sample_to_bin_op,
70-
fixed_array<unsigned int, ActiveChannels> bins)
62+
histogram_shared_kernel(SampleIterator samples,
63+
unsigned int columns,
64+
unsigned int rows,
65+
unsigned int row_stride,
66+
unsigned int rows_per_block,
67+
unsigned int shared_histograms,
68+
fixed_array<Counter*, ActiveChannels> histogram,
69+
const fixed_array<SampleToBinOp, ActiveChannels> sample_to_bin_op,
70+
const fixed_array<unsigned int, ActiveChannels> bins)
7171
{
7272
static constexpr histogram_config_params params = device_params<Config>();
7373

@@ -100,12 +100,12 @@ template<class Config,
100100
class Counter,
101101
class SampleToBinOp>
102102
ROCPRIM_KERNEL ROCPRIM_LAUNCH_BOUNDS(device_params<Config>().histogram_config.block_size) void
103-
histogram_global_kernel(SampleIterator samples,
104-
unsigned int columns,
105-
unsigned int row_stride,
106-
fixed_array<Counter*, ActiveChannels> histogram,
107-
fixed_array<SampleToBinOp, ActiveChannels> sample_to_bin_op,
108-
fixed_array<unsigned int, ActiveChannels> bins_bits)
103+
histogram_global_kernel(SampleIterator samples,
104+
unsigned int columns,
105+
unsigned int row_stride,
106+
fixed_array<Counter*, ActiveChannels> histogram,
107+
const fixed_array<SampleToBinOp, ActiveChannels> sample_to_bin_op,
108+
const fixed_array<unsigned int, ActiveChannels> bins_bits)
109109
{
110110
static constexpr histogram_config_params params = device_params<Config>();
111111

0 commit comments

Comments
 (0)