Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Commit e0bc536

Browse files
committed
dox
Former-commit-id: 3aae638
1 parent a63b9ee commit e0bc536

File tree

150 files changed

+318
-255
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

150 files changed

+318
-255
lines changed

CHANGE_LOG.TXT

Lines changed: 33 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,47 @@
11
//-----------------------------------------------------------------------------
22

3+
0.9.3 04/30/2013
4+
5+
- Added new BlockScan algorithm variant BLOCK_SCAN_RAKING_MEMOIZE, which
6+
trades more register consumption for less shared memory I/O)
7+
- Added block-wide histogram (BlockHisto256)
8+
- Updates to BlockRadixRank to use BlockScan (which improves performance
9+
on Kepler due to SHFL instruction)
10+
- Added device-wide histogram (DeviceHisto256)
11+
- Fixed compilation errors for some WarpScan entrypoints on SM30+
12+
- Allow types other than C++ primitives to be used in WarpScan::*Sum methods
13+
if they only have operator + overloaded. (Previously they also required
14+
to support assignment from int(0).)
15+
- Update BlockReduce's BLOCK_REDUCE_WARP_REDUCTIONS algorithm to work even
16+
when block size is not an even multiple of warp size
17+
- Added work management utility descriptors (GridQueue, GridEvenShare)
18+
- Refactoring of DeviceAllocator interface and CachingDeviceAllocator
19+
implementation
20+
- Misc. documentation updates and corrections.
21+
22+
//-----------------------------------------------------------------------------
23+
324
0.9.2 04/04/2013
425

5-
- Added WarpReduce. WarpReduce uses the SHFL instruction when applicable.
6-
BlockReduce now uses this WarpReduce instead of implementing its own.
7-
8-
- Misc. fixes for 64-bit Linux compilation warnings and errors.
9-
10-
- Misc. documentation updates and corrections.
26+
- Added WarpReduce. WarpReduce uses the SHFL instruction when applicable.
27+
BlockReduce now uses this WarpReduce instead of implementing its own.
28+
- Misc. fixes for 64-bit Linux compilation warnings and errors.
29+
- Misc. documentation updates and corrections.
1130

1231
//-----------------------------------------------------------------------------
1332

1433
0.9.1 03/09/2013
1534

16-
- Fix for ambiguity in BlockScan::Reduce() between generic reduction and
17-
summation. Summation entrypoints are now called ::Sum(), similar
18-
to the convention in BlockScan.
19-
20-
- Small edits to mainpage documentation and download tracking
21-
35+
- Fix for ambiguity in BlockScan::Reduce() between generic reduction and
36+
summation. Summation entrypoints are now called ::Sum(), similar to the
37+
convention in BlockScan.
38+
- Small edits to mainpage documentation and download tracking
39+
2240
//-----------------------------------------------------------------------------
2341

2442
0.9.0 03/07/2013
2543

26-
- Intial "preview" release. CUB is the first durable, high-performance library
27-
of cooperative block-level, warp-level, and thread-level primitives for CUDA
28-
kernel programming. More primitives and examples coming soon!
44+
- Intial "preview" release. CUB is the first durable, high-performance library
45+
of cooperative block-level, warp-level, and thread-level primitives for CUDA
46+
kernel programming. More primitives and examples coming soon!
2947

cub/block/block_histo_256.cuh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828

2929
/**
3030
* \file
31-
* cub::BlockHisto256 provides methods for constructing (and compositing into) 256-valued histograms from 8b data partitioned across threads within a CUDA thread block.
31+
* cub::BlockHisto256 provides methods for constructing (and compositing into) 256-bin histograms from 8b data partitioned across threads within a CUDA thread block.
3232
*/
3333

3434
#pragma once
@@ -78,7 +78,7 @@ enum BlockHisto256Algorithm
7878
*/
7979

8080
/**
81-
* \brief BlockHisto256 provides methods for constructing (and compositing into) 256-valued histograms from 8b data partitioned across threads within a CUDA thread block. ![](histogram_logo.png)
81+
* \brief BlockHisto256 provides methods for constructing (and compositing into) 256-bin histograms from 8b data partitioned across threads within a CUDA thread block. ![](histogram_logo.png)
8282
*
8383
* \par Overview
8484
* A <a href="http://en.wikipedia.org/wiki/Histogram"><em>histogram</em></a>

cub/device/device_histo_256.cuh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929

3030
/**
3131
* \file
32-
* cub::DeviceHisto256 provides variants of device-wide parallel histogram over data residing within global memory.
32+
* cub::DeviceHisto256 provides device-wide parallel operations for constructing 256-bin histogram(s) over data samples residing within global memory.
3333
*/
3434

3535
#pragma once
@@ -169,7 +169,7 @@ __global__ void FinalizeHisto256Kernel(
169169
*/
170170

171171
/**
172-
* \brief DeviceHisto256 provides variants of device-wide parallel histogram over data residing within global memory. ![](histogram_logo.png)
172+
* \brief DeviceHisto256 provides device-wide parallel operations for constructing 256-bin histogram(s) over samples data residing within global memory. ![](histogram_logo.png)
173173
*/
174174
struct DeviceHisto256
175175
{

cub/device/device_reduce.cuh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,7 @@
2929

3030
/**
3131
* \file
32-
* cub::DeviceReduce provides variants of parallel reduction data residing
33-
* within a CUDA device's global memory.
32+
* cub::DeviceReduce provides device-wide parallel operations for reducing data items residing within a CUDA device's global memory.
3433
*/
3534

3635
#pragma once
@@ -158,7 +157,7 @@ __global__ void SingleBlockReduceKernel(
158157
*/
159158

160159
/**
161-
* \brief DeviceReduce provides variants of parallel reduction data residing within a CUDA device's global memory. ![](reduce_logo.png)
160+
* \brief DeviceReduce provides device-wide parallel operations for reducing data items residing within a CUDA device's global memory. ![](reduce_logo.png)
162161
*/
163162
struct DeviceReduce
164163
{

docs/Doxyfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -956,12 +956,14 @@ HTML_EXTRA_STYLESHEET = extra_stylesheet.css
956956
# the files will be copied as-is; there are no commands or markers available.
957957

958958
HTML_EXTRA_FILES = download_cub.html
959+
HTML_EXTRA_FILES += images/nvresearch.png
959960
HTML_EXTRA_FILES += images/download-icon.png
960961
HTML_EXTRA_FILES += images/groups-icon.png
961962
HTML_EXTRA_FILES += images/github-icon-747d8b799a48162434b2c0595ba1317e.png
962963
HTML_EXTRA_FILES += images/favicon.ico
963964
HTML_EXTRA_FILES += images/favicon.png
964965
HTML_EXTRA_FILES += images/tab_b_alt.png
966+
HTML_EXTRA_FILES += images/generic_abstraction.png
965967
HTML_EXTRA_FILES += images/simt_abstraction.png
966968
HTML_EXTRA_FILES += images/kernel_abstraction.png
967969
HTML_EXTRA_FILES += images/devfun_abstraction.png

docs/html/annotated.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@
113113
<img src="transpose_logo.png" alt="transpose_logo.png"/>
114114
</div>
115115
</td></tr>
116-
<tr id="row_0_6_"><td class="entry"><img src="ftv2blank.png" alt="&#160;" width="16" height="22" /><img src="ftv2node.png" alt="o" width="16" height="22" /><img src="ftv2cl.png" alt="C" width="24" height="22" /><a class="el" href="classcub_1_1_block_histo256.html" target="_self">BlockHisto256</a></td><td class="desc"><a class="el" href="classcub_1_1_block_histo256.html" title="BlockHisto256 provides methods for constructing (and compositing into) 256-valued histograms from 8b ...">BlockHisto256</a> provides methods for constructing (and compositing into) 256-valued histograms from 8b data partitioned across threads within a CUDA thread block. </p>
116+
<tr id="row_0_6_"><td class="entry"><img src="ftv2blank.png" alt="&#160;" width="16" height="22" /><img src="ftv2node.png" alt="o" width="16" height="22" /><img src="ftv2cl.png" alt="C" width="24" height="22" /><a class="el" href="classcub_1_1_block_histo256.html" target="_self">BlockHisto256</a></td><td class="desc"><a class="el" href="classcub_1_1_block_histo256.html" title="BlockHisto256 provides methods for constructing (and compositing into) 256-bin histograms from 8b dat...">BlockHisto256</a> provides methods for constructing (and compositing into) 256-bin histograms from 8b data partitioned across threads within a CUDA thread block. </p>
117117
<div class="image">
118118
<img src="histogram_logo.png" alt="histogram_logo.png"/>
119119
</div>
@@ -152,12 +152,12 @@
152152
<tr id="row_0_14_"><td class="entry"><img src="ftv2blank.png" alt="&#160;" width="16" height="22" /><img src="ftv2node.png" alt="o" width="16" height="22" /><img src="ftv2cl.png" alt="C" width="24" height="22" /><a class="el" href="structcub_1_1_cast.html" target="_self">Cast</a></td><td class="desc">Default cast functor</td></tr>
153153
<tr id="row_0_15_" class="even"><td class="entry"><img src="ftv2blank.png" alt="&#160;" width="16" height="22" /><img src="ftv2node.png" alt="o" width="16" height="22" /><img src="ftv2cl.png" alt="C" width="24" height="22" /><a class="el" href="classcub_1_1_device.html" target="_self">Device</a></td><td class="desc">Properties of a given CUDA device and the corresponding PTX bundle</td></tr>
154154
<tr id="row_0_16_"><td class="entry"><img src="ftv2blank.png" alt="&#160;" width="16" height="22" /><img src="ftv2node.png" alt="o" width="16" height="22" /><img src="ftv2cl.png" alt="C" width="24" height="22" /><a class="el" href="classcub_1_1_device_allocator.html" target="_self">DeviceAllocator</a></td><td class="desc">Abstract base allocator class for device memory allocations</td></tr>
155-
<tr id="row_0_17_" class="even"><td class="entry"><img src="ftv2blank.png" alt="&#160;" width="16" height="22" /><img src="ftv2node.png" alt="o" width="16" height="22" /><img src="ftv2cl.png" alt="C" width="24" height="22" /><a class="el" href="structcub_1_1_device_histo256.html" target="_self">DeviceHisto256</a></td><td class="desc"><a class="el" href="structcub_1_1_device_histo256.html" title="DeviceHisto256 provides variants of device-wide parallel histogram over data residing within global m...">DeviceHisto256</a> provides variants of device-wide parallel histogram over data residing within global memory. </p>
155+
<tr id="row_0_17_" class="even"><td class="entry"><img src="ftv2blank.png" alt="&#160;" width="16" height="22" /><img src="ftv2node.png" alt="o" width="16" height="22" /><img src="ftv2cl.png" alt="C" width="24" height="22" /><a class="el" href="structcub_1_1_device_histo256.html" target="_self">DeviceHisto256</a></td><td class="desc"><a class="el" href="structcub_1_1_device_histo256.html" title="DeviceHisto256 provides device-wide parallel operations for constructing 256-bin histogram(s) over sa...">DeviceHisto256</a> provides device-wide parallel operations for constructing 256-bin histogram(s) over samples data residing within global memory. </p>
156156
<div class="image">
157157
<img src="histogram_logo.png" alt="histogram_logo.png"/>
158158
</div>
159159
</td></tr>
160-
<tr id="row_0_18_"><td class="entry"><img src="ftv2blank.png" alt="&#160;" width="16" height="22" /><img src="ftv2node.png" alt="o" width="16" height="22" /><img src="ftv2cl.png" alt="C" width="24" height="22" /><a class="el" href="structcub_1_1_device_reduce.html" target="_self">DeviceReduce</a></td><td class="desc"><a class="el" href="structcub_1_1_device_reduce.html" title="DeviceReduce provides variants of parallel reduction data residing within a CUDA device&#39;s global memo...">DeviceReduce</a> provides variants of parallel reduction data residing within a CUDA device's global memory. </p>
160+
<tr id="row_0_18_"><td class="entry"><img src="ftv2blank.png" alt="&#160;" width="16" height="22" /><img src="ftv2node.png" alt="o" width="16" height="22" /><img src="ftv2cl.png" alt="C" width="24" height="22" /><a class="el" href="structcub_1_1_device_reduce.html" target="_self">DeviceReduce</a></td><td class="desc"><a class="el" href="structcub_1_1_device_reduce.html" title="DeviceReduce provides device-wide parallel operations for reducing data items residing within a CUDA ...">DeviceReduce</a> provides device-wide parallel operations for reducing data items residing within a CUDA device's global memory. </p>
161161
<div class="image">
162162
<img src="reduce_logo.png" alt="reduce_logo.png"/>
163163
</div>
@@ -198,7 +198,7 @@
198198
<!-- HTML footer for doxygen 1.8.3.1-->
199199
<!-- start footer part -->
200200
<hr class="footer"/><address class="footer"><small>
201-
Generated on Tue Apr 30 2013 01:43:33 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
201+
Generated on Tue Apr 30 2013 15:22:27 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
202202
<img class="footer" src="doxygen.png" alt="doxygen"/>
203203
</a> 1.8.3.1
204204
<br>

docs/html/block__discontinuity_8cuh.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@
132132
<!-- HTML footer for doxygen 1.8.3.1-->
133133
<!-- start footer part -->
134134
<hr class="footer"/><address class="footer"><small>
135-
Generated on Tue Apr 30 2013 01:43:33 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
135+
Generated on Tue Apr 30 2013 15:22:26 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
136136
<img class="footer" src="doxygen.png" alt="doxygen"/>
137137
</a> 1.8.3.1
138138
<br>

docs/html/block__exchange_8cuh.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@
131131
<!-- HTML footer for doxygen 1.8.3.1-->
132132
<!-- start footer part -->
133133
<hr class="footer"/><address class="footer"><small>
134-
Generated on Tue Apr 30 2013 01:43:33 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
134+
Generated on Tue Apr 30 2013 15:22:26 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
135135
<img class="footer" src="doxygen.png" alt="doxygen"/>
136136
</a> 1.8.3.1
137137
<br>

docs/html/block__histo__256_8cuh.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@
112112
<tr class="heading"><td colspan="2"><h2 class="groupheader"><a name="nested-classes"></a>
113113
Classes</h2></td></tr>
114114
<tr class="memitem:"><td class="memItemLeft" align="right" valign="top">class &#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="classcub_1_1_block_histo256.html">cub::BlockHisto256&lt; BLOCK_THREADS, ITEMS_PER_THREAD, ALGORITHM &gt;</a></td></tr>
115-
<tr class="memdesc:"><td class="mdescLeft">&#160;</td><td class="mdescRight"><a class="el" href="classcub_1_1_block_histo256.html" title="BlockHisto256 provides methods for constructing (and compositing into) 256-valued histograms from 8b ...">BlockHisto256</a> provides methods for constructing (and compositing into) 256-valued histograms from 8b data partitioned across threads within a CUDA thread block. </p>
115+
<tr class="memdesc:"><td class="mdescLeft">&#160;</td><td class="mdescRight"><a class="el" href="classcub_1_1_block_histo256.html" title="BlockHisto256 provides methods for constructing (and compositing into) 256-bin histograms from 8b dat...">BlockHisto256</a> provides methods for constructing (and compositing into) 256-bin histograms from 8b data partitioned across threads within a CUDA thread block. </p>
116116
<div class="image">
117117
<img src="histogram_logo.png" alt="histogram_logo.png"/>
118118
<div class="caption">
@@ -134,12 +134,12 @@
134134
<tr class="separator:a0f61554b5c901fcc01adb8af3d9aacca"><td class="memSeparator" colspan="2">&#160;</td></tr>
135135
</table>
136136
<a name="details" id="details"></a><h2 class="groupheader">Detailed Description</h2>
137-
<div class="textblock"><p><a class="el" href="classcub_1_1_block_histo256.html" title="BlockHisto256 provides methods for constructing (and compositing into) 256-valued histograms from 8b ...">cub::BlockHisto256</a> provides methods for constructing (and compositing into) 256-valued histograms from 8b data partitioned across threads within a CUDA thread block. </p>
137+
<div class="textblock"><p><a class="el" href="classcub_1_1_block_histo256.html" title="BlockHisto256 provides methods for constructing (and compositing into) 256-bin histograms from 8b dat...">cub::BlockHisto256</a> provides methods for constructing (and compositing into) 256-bin histograms from 8b data partitioned across threads within a CUDA thread block. </p>
138138
</div></div><!-- contents -->
139139
<!-- HTML footer for doxygen 1.8.3.1-->
140140
<!-- start footer part -->
141141
<hr class="footer"/><address class="footer"><small>
142-
Generated on Tue Apr 30 2013 01:43:33 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
142+
Generated on Tue Apr 30 2013 15:22:26 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
143143
<img class="footer" src="doxygen.png" alt="doxygen"/>
144144
</a> 1.8.3.1
145145
<br>

docs/html/block__load_8cuh.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@
207207
<!-- HTML footer for doxygen 1.8.3.1-->
208208
<!-- start footer part -->
209209
<hr class="footer"/><address class="footer"><small>
210-
Generated on Tue Apr 30 2013 01:43:33 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
210+
Generated on Tue Apr 30 2013 15:22:26 for CUB by &#160;<a href="http://www.doxygen.org/index.html">
211211
<img class="footer" src="doxygen.png" alt="doxygen"/>
212212
</a> 1.8.3.1
213213
<br>

0 commit comments

Comments
 (0)