Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Commit c60ec5b

Browse files
committed
Doc updates for new version
Former-commit-id: 1cf8089
1 parent 1843f06 commit c60ec5b

File tree

144 files changed

+592
-308
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

144 files changed

+592
-308
lines changed

CHANGE_LOG.TXT

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
//-----------------------------------------------------------------------------
22

3+
1.2.0 02/25/2014
4+
- New features:
5+
6+
//-----------------------------------------------------------------------------
7+
38
1.1.1 12/11/2013
49
- New features:
510
- Added TexObjInputIterator, TexRefInputIterator, CacheModifiedInputIterator, and CacheModifiedOutputIterator types for loading & storing arbitrary types through the cache hierarchy. Compatible with Thrust API.

LICENSE.TXT

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Copyright (c) 2010-2011, Duane Merrill. All rights reserved.
2-
Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
2+
Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
33

44
Redistribution and use in source and binary forms, with or without
55
modification, are permitted provided that the following conditions are met:

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<hr>
22
<h3>About CUB</h3>
33

4-
Current release: v1.1.1 (December 11, 2013)
4+
Current release: v1.2.0 (February 25, 2014)
55

66
We recommend the [CUB Project Website](http://nvlabs.github.com/cub) and the [cub-users discussion forum](http://groups.google.com/group/cub-users) for further information and examples.
77

@@ -84,6 +84,7 @@ See [CUB Project Website](http://nvlabs.github.com/cub) for more information.
8484
8585
| Date | Version |
8686
| ---- | ------- |
87+
| 02/25/2014 | [CUB v1.2.0 Primary Release](https://github.com/NVlabs/cub/archive/1.2.0.zip) |
8788
| 12/10/2013 | [CUB v1.1.1 Primary Release](https://github.com/NVlabs/cub/archive/1.1.1.zip) |
8889
| 08/08/2013 | [CUB v1.0.1 Primary Release](https://github.com/NVlabs/cub/archive/1.0.1.zip) |
8990
| 05/07/2013 | [CUB v0.9.4 Update Release](https://github.com/NVlabs/cub/archive/0.9.4.zip) |
@@ -104,7 +105,7 @@ CUB is available under the "New BSD" open-source license:
104105
105106
```
106107
Copyright (c) 2010-2011, Duane Merrill. All rights reserved.
107-
Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
108+
Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
108109

109110
Redistribution and use in source and binary forms, with or without
110111
modification, are permitted provided that the following conditions are met:

cub/block/block_discontinuity.cuh

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/******************************************************************************
22
* Copyright (c) 2011, Duane Merrill. All rights reserved.
3-
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
3+
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
44
*
55
* Redistribution and use in source and binary forms, with or without
66
* modification, are permitted provided that the following conditions are met:
@@ -61,7 +61,7 @@ namespace cub {
6161
* \blockcollective{BlockDiscontinuity}
6262
* \par
6363
* The code snippet below illustrates the head flagging of 512 integer items that
64-
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
64+
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
6565
* where each thread owns 4 consecutive items.
6666
* \par
6767
* \code
@@ -274,7 +274,7 @@ public:
274274
*
275275
* \par
276276
* The code snippet below illustrates the head-flagging of 512 integer items that
277-
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
277+
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
278278
* where each thread owns 4 consecutive items.
279279
* \par
280280
* \code
@@ -352,7 +352,7 @@ public:
352352
*
353353
* \par
354354
* The code snippet below illustrates the head-flagging of 512 integer items that
355-
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
355+
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
356356
* where each thread owns 4 consecutive items.
357357
* \par
358358
* \code
@@ -445,7 +445,7 @@ public:
445445
*
446446
* \par
447447
* The code snippet below illustrates the tail-flagging of 512 integer items that
448-
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
448+
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
449449
* where each thread owns 4 consecutive items.
450450
* \par
451451
* \code
@@ -524,7 +524,7 @@ public:
524524
*
525525
* \par
526526
* The code snippet below illustrates the tail-flagging of 512 integer items that
527-
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec4sec3) across 128 threads
527+
* are partitioned in a [<em>blocked arrangement</em>](index.html#sec5sec3) across 128 threads
528528
* where each thread owns 4 consecutive items.
529529
* \par
530530
* \code

cub/block/block_exchange.cuh

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/******************************************************************************
22
* Copyright (c) 2011, Duane Merrill. All rights reserved.
3-
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
3+
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
44
*
55
* Redistribution and use in source and binary forms, with or without
66
* modification, are permitted provided that the following conditions are met:
@@ -60,10 +60,10 @@ namespace cub {
6060
* yet most block-wide operations prefer a "blocked" partitioning of items across threads
6161
* (where consecutive items belong to a single thread).
6262
* - BlockExchange supports the following types of data exchanges:
63-
* - Transposing between [<em>blocked</em>](index.html#sec4sec3) and [<em>striped</em>](index.html#sec4sec3) arrangements
64-
* - Transposing between [<em>blocked</em>](index.html#sec4sec3) and [<em>warp-striped</em>](index.html#sec4sec3) arrangements
65-
* - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec4sec3)
66-
* - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec4sec3)
63+
* - Transposing between [<em>blocked</em>](index.html#sec5sec3) and [<em>striped</em>](index.html#sec5sec3) arrangements
64+
* - Transposing between [<em>blocked</em>](index.html#sec5sec3) and [<em>warp-striped</em>](index.html#sec5sec3) arrangements
65+
* - Scattering ranked items to a [<em>blocked arrangement</em>](index.html#sec5sec3)
66+
* - Scattering ranked items to a [<em>striped arrangement</em>](index.html#sec5sec3)
6767
*
6868
* \par A Simple Example
6969
* \blockcollective{BlockExchange}

cub/block/block_histogram.cuh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/******************************************************************************
22
* Copyright (c) 2011, Duane Merrill. All rights reserved.
3-
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
3+
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
44
*
55
* Redistribution and use in source and binary forms, with or without
66
* modification, are permitted provided that the following conditions are met:

cub/block/block_load.cuh

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/******************************************************************************
22
* Copyright (c) 2011, Duane Merrill. All rights reserved.
3-
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
3+
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
44
*
55
* Redistribution and use in source and binary forms, with or without
66
* modification, are permitted provided that the following conditions are met:
@@ -441,7 +441,7 @@ enum BlockLoadAlgorithm
441441
/**
442442
* \par Overview
443443
*
444-
* A [<em>blocked arrangement</em>](index.html#sec4sec3) of data is read
444+
* A [<em>blocked arrangement</em>](index.html#sec5sec3) of data is read
445445
* directly from memory. The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub>
446446
* reads the <em>i</em><sup>th</sup> segment of consecutive elements.
447447
*
@@ -454,7 +454,7 @@ enum BlockLoadAlgorithm
454454
/**
455455
* \par Overview
456456
*
457-
* A [<em>blocked arrangement</em>](index.html#sec4sec3) of data is read directly
457+
* A [<em>blocked arrangement</em>](index.html#sec5sec3) of data is read directly
458458
* from memory using CUDA's built-in vectorized loads as a coalescing optimization.
459459
* The thread block reads items in a parallel "raking" fashion: thread<sub><em>i</em></sub> uses vector loads to
460460
* read the <em>i</em><sup>th</sup> segment of consecutive elements.
@@ -476,13 +476,13 @@ enum BlockLoadAlgorithm
476476
/**
477477
* \par Overview
478478
*
479-
* A [<em>striped arrangement</em>](index.html#sec4sec3) of data is read
479+
* A [<em>striped arrangement</em>](index.html#sec5sec3) of data is read
480480
* directly from memory and then is locally transposed into a
481-
* [<em>blocked arrangement</em>](index.html#sec4sec3). The thread block
481+
* [<em>blocked arrangement</em>](index.html#sec5sec3). The thread block
482482
* reads items in a parallel "strip-mining" fashion:
483483
* thread<sub><em>i</em></sub> reads items having stride \p BLOCK_THREADS
484484
* between them. cub::BlockExchange is then used to locally reorder the items
485-
* into a [<em>blocked arrangement</em>](index.html#sec4sec3).
485+
* into a [<em>blocked arrangement</em>](index.html#sec5sec3).
486486
*
487487
* \par Performance Considerations
488488
* - The utilization of memory transactions (coalescing) remains high regardless
@@ -496,13 +496,13 @@ enum BlockLoadAlgorithm
496496
/**
497497
* \par Overview
498498
*
499-
* A [<em>warp-striped arrangement</em>](index.html#sec4sec3) of data is read
499+
* A [<em>warp-striped arrangement</em>](index.html#sec5sec3) of data is read
500500
* directly from memory and then is locally transposed into a
501-
* [<em>blocked arrangement</em>](index.html#sec4sec3). Each warp reads its own
501+
* [<em>blocked arrangement</em>](index.html#sec5sec3). Each warp reads its own
502502
* contiguous segment in a parallel "strip-mining" fashion: lane<sub><em>i</em></sub>
503503
* reads items having stride \p WARP_THREADS between them. cub::BlockExchange
504504
* is then used to locally reorder the items into a
505-
* [<em>blocked arrangement</em>](index.html#sec4sec3).
505+
* [<em>blocked arrangement</em>](index.html#sec5sec3).
506506
*
507507
* \par Usage Considerations
508508
* - BLOCK_THREADS must be a multiple of WARP_THREADS
@@ -518,7 +518,7 @@ enum BlockLoadAlgorithm
518518

519519

520520
/**
521-
* \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec4sec3) across a CUDA thread block. ![](block_load_logo.png)
521+
* \brief The BlockLoad class provides [<em>collective</em>](index.html#sec0) data movement methods for loading a linear segment of items from memory into a [<em>blocked arrangement</em>](index.html#sec5sec3) across a CUDA thread block. ![](block_load_logo.png)
522522
* \ingroup BlockModule
523523
* \ingroup UtilIo
524524
*
@@ -533,17 +533,17 @@ enum BlockLoadAlgorithm
533533
* to implement different cub::BlockLoadAlgorithm strategies. This facilitates different
534534
* performance policies for different architectures, data types, granularity sizes, etc.
535535
* - BlockLoad can be optionally specialized by different data movement strategies:
536-
* -# <b>cub::BLOCK_LOAD_DIRECT</b>. A [<em>blocked arrangement</em>](index.html#sec4sec3)
536+
* -# <b>cub::BLOCK_LOAD_DIRECT</b>. A [<em>blocked arrangement</em>](index.html#sec5sec3)
537537
* of data is read directly from memory. [More...](\ref cub::BlockLoadAlgorithm)
538-
* -# <b>cub::BLOCK_LOAD_VECTORIZE</b>. A [<em>blocked arrangement</em>](index.html#sec4sec3)
538+
* -# <b>cub::BLOCK_LOAD_VECTORIZE</b>. A [<em>blocked arrangement</em>](index.html#sec5sec3)
539539
* of data is read directly from memory using CUDA's built-in vectorized loads as a
540540
* coalescing optimization. [More...](\ref cub::BlockLoadAlgorithm)
541-
* -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>. A [<em>striped arrangement</em>](index.html#sec4sec3)
541+
* -# <b>cub::BLOCK_LOAD_TRANSPOSE</b>. A [<em>striped arrangement</em>](index.html#sec5sec3)
542542
* of data is read directly from memory and is then locally transposed into a
543-
* [<em>blocked arrangement</em>](index.html#sec4sec3). [More...](\ref cub::BlockLoadAlgorithm)
544-
* -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>. A [<em>warp-striped arrangement</em>](index.html#sec4sec3)
543+
* [<em>blocked arrangement</em>](index.html#sec5sec3). [More...](\ref cub::BlockLoadAlgorithm)
544+
* -# <b>cub::BLOCK_LOAD_WARP_TRANSPOSE</b>. A [<em>warp-striped arrangement</em>](index.html#sec5sec3)
545545
* of data is read directly from memory and is then locally transposed into a
546-
* [<em>blocked arrangement</em>](index.html#sec4sec3). [More...](\ref cub::BlockLoadAlgorithm)
546+
* [<em>blocked arrangement</em>](index.html#sec5sec3). [More...](\ref cub::BlockLoadAlgorithm)
547547
*
548548
* \par A Simple Example
549549
* \blockcollective{BlockLoad}

cub/block/block_radix_rank.cuh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/******************************************************************************
22
* Copyright (c) 2011, Duane Merrill. All rights reserved.
3-
* Copyright (c) 2011-2013, NVIDIA CORPORATION. All rights reserved.
3+
* Copyright (c) 2011-2014, NVIDIA CORPORATION. All rights reserved.
44
*
55
* Redistribution and use in source and binary forms, with or without
66
* modification, are permitted provided that the following conditions are met:
@@ -63,7 +63,7 @@ namespace cub {
6363
*
6464
* \par Usage Considerations
6565
* - Keys must be in a form suitable for radix ranking (i.e., unsigned bits).
66-
* - Assumes a [<em>blocked arrangement</em>](index.html#sec4sec3) of elements across threads
66+
* - Assumes a [<em>blocked arrangement</em>](index.html#sec5sec3) of elements across threads
6767
* - \smemreuse{BlockRadixRank::TempStorage}
6868
*
6969
* \par Performance Considerations

0 commit comments

Comments
 (0)