Unexpected degradation of FPR when `pattern_bits=4`

I am encountering unexpected behavior when using `cuco::bloom_filter` with `pattern_bits = 4`. The false positive rate (FPR) degrades too dramatically when changing from `pattern_bits = 8` with a constant 'load factor' (i.e., the fraction of bits set in the filter). The issue may be related to the bit pattern selection.  

The following code demonstrates the issue:  

```cpp
#include <cuco/bloom_filter.cuh>
#include <iostream>
#include <thrust/count.h>
#include <thrust/device_vector.h>
#include <thrust/sequence.h>

// 'Blocked' filter policy with 8B blocks
using policy_t = cuco::default_filter_policy<cuco::xxhash_64<uint32_t>, uint64_t, 1>;
using bf_t =
  cuco::bloom_filter<uint32_t, cuco::extent<std::size_t>, cuda::thread_scope_device, policy_t>;
constexpr size_t bits_per_block   = 64;
constexpr uint32_t pattern_bits_A = 4;
constexpr uint32_t pattern_bits_B = 8;
constexpr size_t bits_per_key_A   = 2 * pattern_bits_A;
constexpr size_t bits_per_key_B   = 2 * pattern_bits_B;

int main()
{
  // Initialize non-overlapping build and probe key sets
  thrust::device_vector<uint32_t> build_keys(1U << 20U);
  thrust::device_vector<uint32_t> probe_keys(1U << 25U);
  thrust::device_vector<bool> flags_A(1U << 25U, false);
  thrust::device_vector<bool> flags_B(1U << 25U, false);
  thrust::sequence(build_keys.begin(), build_keys.end(), 0, 2);
  thrust::sequence(probe_keys.begin(), probe_keys.end(), 1, 2);

  // Specify pattern bits for the policy
  policy_t policy_A(pattern_bits_A);
  bf_t filter_A(cuda::ceil_div(bits_per_key_A * build_keys.size(), bits_per_block), {}, policy_A);
  filter_A.add(build_keys.begin(), build_keys.end());
  filter_A.contains(probe_keys.begin(), probe_keys.end(), flags_A.begin());
  size_t fps_A   = thrust::count(flags_A.begin(), flags_A.end(), true);
  double_t fpr_A = 100.0 * fps_A / flags_A.size();
  std::cout << "FPR A: " << fpr_A << "\n";

  policy_t policy_B(pattern_bits_B);
  bf_t filter_B(cuda::ceil_div(bits_per_key_B * build_keys.size(), bits_per_block), {}, policy_B);
  filter_B.add(build_keys.begin(), build_keys.end());
  filter_B.contains(probe_keys.begin(), probe_keys.end(), flags_B.begin());
  size_t fps_B   = thrust::count(flags_B.begin(), flags_B.end(), true);
  double_t fpr_B = 100.0 * fps_B / flags_B.size();
  std::cout << "FPR B: " << fpr_B << "\n";

  return 0;
}
```

**Observed Behavior:**  
```
FPR A: 16.9311
FPR B: 0.611573
```

**Expected Behavior:**  
The FPR should increase more smoothly with decreasing `pattern_bits` / filter size. This configuration of 8B blocks with 4 bits being set per key is common ([arrow/acero](https://github.com/apache/arrow/blob/main/cpp/src/arrow/acero/bloom_filter.h)) and is not expected to produce such a high FPR with a 'load factor' of `0.5`.

**Environment:**  
- **Cuco version:** 0.0.1
- **CUDA version:** 12.2
- **Compiler:** gcc 11.4.0
- **GPU:** L4
- **OS:** Ubuntu

Would appreciate any insights into what might be causing this! Or, if I'm missing something. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected degradation of FPR when `pattern_bits=4` #682

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected degradation of FPR when pattern_bits=4 #682

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Unexpected degradation of FPR when `pattern_bits=4` #682