Skip to content

Unexpected degradation of FPR when pattern_bits=4 #682

@kevkrist

Description

@kevkrist

I am encountering unexpected behavior when using cuco::bloom_filter with pattern_bits = 4. The false positive rate (FPR) degrades too dramatically when changing from pattern_bits = 8 with a constant 'load factor' (i.e., the fraction of bits set in the filter). The issue may be related to the bit pattern selection.

The following code demonstrates the issue:

#include <cuco/bloom_filter.cuh>
#include <iostream>
#include <thrust/count.h>
#include <thrust/device_vector.h>
#include <thrust/sequence.h>

// 'Blocked' filter policy with 8B blocks
using policy_t = cuco::default_filter_policy<cuco::xxhash_64<uint32_t>, uint64_t, 1>;
using bf_t =
  cuco::bloom_filter<uint32_t, cuco::extent<std::size_t>, cuda::thread_scope_device, policy_t>;
constexpr size_t bits_per_block   = 64;
constexpr uint32_t pattern_bits_A = 4;
constexpr uint32_t pattern_bits_B = 8;
constexpr size_t bits_per_key_A   = 2 * pattern_bits_A;
constexpr size_t bits_per_key_B   = 2 * pattern_bits_B;

int main()
{
  // Initialize non-overlapping build and probe key sets
  thrust::device_vector<uint32_t> build_keys(1U << 20U);
  thrust::device_vector<uint32_t> probe_keys(1U << 25U);
  thrust::device_vector<bool> flags_A(1U << 25U, false);
  thrust::device_vector<bool> flags_B(1U << 25U, false);
  thrust::sequence(build_keys.begin(), build_keys.end(), 0, 2);
  thrust::sequence(probe_keys.begin(), probe_keys.end(), 1, 2);

  // Specify pattern bits for the policy
  policy_t policy_A(pattern_bits_A);
  bf_t filter_A(cuda::ceil_div(bits_per_key_A * build_keys.size(), bits_per_block), {}, policy_A);
  filter_A.add(build_keys.begin(), build_keys.end());
  filter_A.contains(probe_keys.begin(), probe_keys.end(), flags_A.begin());
  size_t fps_A   = thrust::count(flags_A.begin(), flags_A.end(), true);
  double_t fpr_A = 100.0 * fps_A / flags_A.size();
  std::cout << "FPR A: " << fpr_A << "\n";

  policy_t policy_B(pattern_bits_B);
  bf_t filter_B(cuda::ceil_div(bits_per_key_B * build_keys.size(), bits_per_block), {}, policy_B);
  filter_B.add(build_keys.begin(), build_keys.end());
  filter_B.contains(probe_keys.begin(), probe_keys.end(), flags_B.begin());
  size_t fps_B   = thrust::count(flags_B.begin(), flags_B.end(), true);
  double_t fpr_B = 100.0 * fps_B / flags_B.size();
  std::cout << "FPR B: " << fpr_B << "\n";

  return 0;
}

Observed Behavior:

FPR A: 16.9311
FPR B: 0.611573

Expected Behavior:
The FPR should increase more smoothly with decreasing pattern_bits / filter size. This configuration of 8B blocks with 4 bits being set per key is common (arrow/acero) and is not expected to produce such a high FPR with a 'load factor' of 0.5.

Environment:

  • Cuco version: 0.0.1
  • CUDA version: 12.2
  • Compiler: gcc 11.4.0
  • GPU: L4
  • OS: Ubuntu

Would appreciate any insights into what might be causing this! Or, if I'm missing something. Thanks!

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions