Skip to content

Conversation

@enp1s0
Copy link
Member

@enp1s0 enp1s0 commented Nov 12, 2025

This PR:

  1. Introduces the E5M2 data type as the internal storage type to improve performance
  2. Adds support for PQ_LEN = 8 to achieve a higher compression ratio

E5M2 as smem data type

Using a lower-precision data type helps reduce shared memory bank conflicts and can improve throughput.
Since the quantization error from VQ+PQ is typically larger than the representation error of E5M2, the impact on search recall is expected to be negligible.

TODO: performance improvement figure


Support for PQ_LEN=8

The current cuVS implementation supports only PQ_LEN = 2 (4 bits per vector element) and 4 (2 bits per vector element).
This PR adds support for PQ_LEN = 8 to enable a higher compression ratio (1 bit per vector element).

TODO

  • Reduce unnecessary compute_distance_vpq instances

@enp1s0 enp1s0 self-assigned this Nov 12, 2025
@enp1s0 enp1s0 added feature request New feature or request improvement Improves an existing functionality non-breaking Introduces a non-breaking change and removed improvement Improves an existing functionality labels Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

Development

Successfully merging this pull request may close these issues.

1 participant