Skip to content

Commit 1be7c3f

Browse files
committed
Merge branch 'master' into nail-down-dependencies
2 parents a0de3ac + a16c6e1 commit 1be7c3f

16 files changed

+1871
-1789
lines changed

CMakeLists.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,24 @@ foreach(cuda_arch ${sm})
5555
list(APPEND cuda_arch_list ${cuda_arch})
5656
message(STATUS "Assign GPU architecture (sm=${cuda_arch})")
5757
endforeach()
58+
5859
list(LENGTH cuda_arch_list cuda_arch_list_length)
5960
if(cuda_arch_list_length EQUAL 0)
6061
list(APPEND cuda_arch_list "80")
6162
message(STATUS "Assign default GPU architecture sm=80")
6263
endif()
64+
65+
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
66+
add_compile_definitions(CUDA_ERROR_CHECK)
67+
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -lineinfo")
68+
endif()
69+
6370
foreach(cuda_arch ${cuda_arch_list})
6471
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode arch=compute_${cuda_arch},code=sm_${cuda_arch}")
6572
endforeach()
6673

74+
message(CMAKE_CUDA_FLAGS="${CMAKE_CUDA_FLAGS}")
75+
6776
include_directories(
6877
${PROJECT_SOURCE_DIR}/include
6978
${PROJECT_SOURCE_DIR}/third_party/libcudacxx/include

README.md

Lines changed: 55 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The key capability of HierarchicalKV is to store key-value (feature-embedding) o
1212

1313
You can also use the library for generic key-value storage.
1414

15-
## Benefits of HierarchicalKV
15+
## Benefits
1616

1717
When building large recommender systems, machine learning (ML) engineers face the following challenges:
1818

@@ -29,23 +29,48 @@ HierarchicalKV alleviates these challenges and helps the machine learning engine
2929
The strategies are implemented by CUDA kernels.
3030
- Operates at a high working-status load factor that is close to 1.0.
3131

32+
33+
## Key ideas
34+
35+
- Buckets are locally ordered
36+
- Store keys and values separately
37+
- Store all the keys in HBM
38+
- Build-in and customizable eviction strategy
39+
3240
HierarchicalKV makes NVIDIA GPUs more suitable for training large and super-large models of ***search, recommendations, and advertising***.
3341
The library simplifies the common challenges to building, evaluating, and serving sophisticated recommenders models.
3442

3543
## API Documentation
3644

37-
The main classes and structs are below, and it's recommended to read the comments in the source code directly:
45+
The main classes and structs are below, but reading the comments in the source code is recommended:
3846

39-
- [`class HashTable`](https://github.com/NVIDIA-Merlin/HierarchicalKV/blob/master/include/merlin_hashtable.cuh#L101)
40-
- [`class EvictStrategy`](https://github.com/NVIDIA-Merlin/HierarchicalKV/blob/master/include/merlin_hashtable.cuh#L106)
41-
- [`struct HashTableOptions`](https://github.com/NVIDIA-Merlin/HierarchicalKV/blob/master/include/merlin_hashtable.cuh#L34)
42-
- [`Struct HashTable::Vector`](https://github.com/NVIDIA-Merlin/HierarchicalKV/blob/master/include/merlin_hashtable.cuh#L106)
47+
- [`class HashTable`](https://github.com/NVIDIA-Merlin/HierarchicalKV/blob/master/include/merlin_hashtable.cuh#L151)
48+
- [`class EvictStrategy`](https://github.com/NVIDIA-Merlin/HierarchicalKV/blob/master/include/merlin_hashtable.cuh#L52)
49+
- [`struct HashTableOptions`](https://github.com/NVIDIA-Merlin/HierarchicalKV/blob/master/include/merlin_hashtable.cuh#L60)
4350

4451
For regular API doc, please refer to [API Docs](https://nvidia-merlin.github.io/HierarchicalKV/master/api/index.html)
4552

53+
## API Maturity Matrix
54+
55+
`Industrial verified` means the API has been well-tested and verified in at least one real-world scenario.
56+
57+
| Name | Description | Function |
58+
|:---------------------|:----------------------------------------------------------------------------------------------------------------------|:--------------------|
59+
| __insert_or_assign__ | Insert or assign for the specified keys. If the target bucket is full, overwrite the key with minimum score in it. | Well-tested |
60+
| __insert_and_evict__ | Insert new keys. If the target bucket is full, the keys with minimum score will be evicted for placement the new key. | Industrial verified |
61+
| __find_or_insert__ | Search for the specified keys. If missing, insert it. | Well-tested |
62+
| __assign__ | Update for each key and ignore the missed one. | Well-tested |
63+
| __accum_or_assign__ | Search and update for each key. If found, add value as a delta to the old value. If missing, update it directly. | Well-tested |
64+
| __find_or_insert\*__ | Search for the specified keys and return the pointers of values. If missing, insert it. | Well-tested |
65+
| __find__ | Search for the specified keys. | Industrial verified |
66+
| __find\*__ | Search and return the pointers of values, thread-unsafe but with high performance. | Well-tested |
67+
| __export_batch__ | Exports a certain number of the key-value-score tuples. | Industrial verified |
68+
| __export_batch_if__ | Exports a certain number of the key-value-score tuples which match specific conditions. | Industrial verified |
69+
| __warmup__ | Move the hot key-values from HMEM to HBM | June 15, 2023 |
70+
4671
## Usage restrictions
4772

48-
- The `key_type` and `meta_type` must be `uint64_t`.
73+
- The `key_type` and `score_type` must be `uint64_t`.
4974
- The keys of `0xFFFFFFFFFFFFFFFC`, `0xFFFFFFFFFFFFFFFD`, `0xFFFFFFFFFFFFFFFE`, and `0xFFFFFFFFFFFFFFFF` are reserved for internal using.
5075

5176
## Contributors
@@ -97,46 +122,46 @@ Your environment must meet the following requirements:
97122
* Key Type = uint64_t
98123
* Value Type = float32 * {dim}
99124
* Key-Values per OP = 1048576
100-
* Hit rate = 0.60
125+
* `λ`: load factor
101126
* `find*` means the `find` API that directly returns the addresses of values.
127+
* `find_or_insert*` means the `find_or_insert` API that directly returns the addresses of values.
102128
* ***Throughput Unit: Billion-KV/second***
103129

104-
### On pure HBM mode:
130+
### On pure HBM mode:
105131

106132
* dim = 4, capacity = 64 Million-KV, HBM = 32 GB, HMEM = 0 GB
107133

108-
| load_factor | insert_or_assign | find | find_or_insert | assign | find* | insert_and_evict |
109-
|------------:|-----------------:|-------:|---------------:|-------:|-------:|-----------------:|
110-
| 0.50 | 1.402 | 2.958 | 1.743 | 1.954 | 3.632 | 1.178 |
111-
| 0.75 | 1.072 | 1.629 | 0.617 | 0.914 | 1.851 | 0.906 |
112-
| 1.00 | 0.352 | 0.826 | 0.342 | 0.552 | 0.895 | 0.303 |
134+
| λ | insert_or_assign | find | find_or_insert | assign | find* | find_or_insert* | insert_and_evict |
135+
|-----:|-----------------:|------:|---------------:|-------:|------:|----------------:|-----------------:|
136+
| 0.50 | 1.397 | 2.923 | 1.724 | 1.945 | 3.609 | 1.756 | 1.158 |
137+
| 0.75 | 1.062 | 1.607 | 0.615 | 0.910 | 1.836 | 1.175 | 0.900 |
138+
| 1.00 | 0.352 | 0.826 | 0.342 | 0.551 | 0.894 | 0.357 | 0.302 |
113139

114140
* dim = 64, capacity = 64 Million-KV, HBM = 16 GB, HMEM = 0 GB
115141

116-
| load_factor | insert_or_assign | find | find_or_insert | assign | find* | insert_and_evict |
117-
|------------:|-----------------:|-------:|---------------:|-------:|-------:|-----------------:|
118-
| 0.50 | 0.925 | 1.584 | 0.890 | 1.128 | 3.645 | 0.795 |
119-
| 0.75 | 0.665 | 1.115 | 0.541 | 0.834 | 1.849 | 0.569 |
120-
| 1.00 | 0.323 | 0.640 | 0.314 | 0.512 | 0.896 | 0.179 |
142+
| λ | insert_or_assign | find | find_or_insert | assign | find* | find_or_insert* | insert_and_evict |
143+
|-----:|-----------------:|------:|---------------:|-------:|------:|----------------:|-----------------:|
144+
| 0.50 | 0.924 | 1.587 | 0.888 | 1.125 | 3.628 | 1.756 | 0.789 |
145+
| 0.75 | 0.662 | 1.115 | 0.540 | 0.833 | 1.844 | 1.177 | 0.566 |
146+
| 1.00 | 0.323 | 0.642 | 0.314 | 0.512 | 0.897 | 0.358 | 0.177 |
121147

122148
### On HBM+HMEM hybrid mode:
123149

124150
* dim = 64, capacity = 128 Million-KV, HBM = 16 GB, HMEM = 16 GB
125151

126-
| load_factor | insert_or_assign | find | find_or_insert | assign | find* |
127-
|------------:|-----------------:|-------:|---------------:|-------:|-------:|
128-
| 0.50 | 0.121 | 0.149 | 0.120 | 0.147 | 3.397 |
129-
| 0.75 | 0.116 | 0.145 | 0.115 | 0.143 | 1.800 |
130-
| 1.00 | 0.087 | 0.125 | 0.087 | 0.114 | 0.883 |
152+
| λ | insert_or_assign | find | find_or_insert | assign | find* | find_or_insert* |
153+
|-----:|-----------------:|-------:|---------------:|-------:|-------:|----------------:|
154+
| 0.50 | 0.122 | 0.149 | 0.120 | 0.148 | 3.414 | 1.690 |
155+
| 0.75 | 0.117 | 0.145 | 0.115 | 0.143 | 1.808 | 1.161 |
156+
| 1.00 | 0.088 | 0.125 | 0.087 | 0.114 | 0.884 | 0.355 |
131157

132158
* dim = 64, capacity = 1024 Million-KV, HBM = 56 GB, HMEM = 200 GB
133159

134-
| load_factor | insert_or_assign | find | find_or_insert | assign | find* |
135-
|------------:|-----------------:|-------:|---------------:|-------:|-------:|
136-
| 0.50 | 0.036 | 0.054 | 0.035 | 0.045 | 2.809 |
137-
| 0.75 | 0.035 | 0.055 | 0.034 | 0.047 | 1.930 |
138-
| 1.00 | 0.034 | 0.051 | 0.031 | 0.047 | 0.855 |
139-
160+
| λ | insert_or_assign | find | find_or_insert | assign | find* | find_or_insert* |
161+
|-----:|-----------------:|-------:|---------------:|-------:|-------:|----------------:|
162+
| 0.50 | 0.037 | 0.053 | 0.034 | 0.050 | 2.822 | 1.715 |
163+
| 0.75 | 0.036 | 0.053 | 0.033 | 0.049 | 1.920 | 1.082 |
164+
| 1.00 | 0.032 | 0.049 | 0.030 | 0.044 | 0.855 | 0.351 |
140165

141166
### Support and Feedback:
142167

0 commit comments

Comments
 (0)