Skip to content

RFC: Proposal to Update vecs Python Client to Include Latest pgvector Functionalities #93

@Muhtasham

Description

@Muhtasham

Summary

This RFC proposes adding support for the latest pgvector features into the vecs Python client. These include new vector types (halfvec, sparsevec), enhanced indexing capabilities, and additional vector functions (binary_quantize, hamming_distance, etc.).

Rationale

Recent advancements in pgvector—such as new vector types, improved indexing, and new functions—are currently missing from the vecs client. Integrating these features will ensure feature parity, enabling efficient storage, diverse similarity metrics, and extended vector operations, which will support a broader range of use cases.

Design

Proposed Additions

  1. Vector Types:

    • halfvec: Half precision vectors for reduced storage and faster operations.
    • sparsevec: Sparse vectors that store only non-zero values to optimize memory usage.
  2. Indexing Enhancements:

    • bit Type Indexing: Add support for indexing vectors stored as bit type.
    • L1 Distance with HNSW: Add support for using L1 distance with HNSW indexing for similarity searches.
  3. New Functions:

    • binary_quantize: Converts a vector into a binary form based on a threshold.
    • hamming_distance: Calculates Hamming distance for binary vectors.
    • jaccard_distance: Computes the Jaccard distance between vectors.
    • l2_normalize: Normalizes vectors to unit length.
    • subvector: Extracts a subvector from the main vector.

Examples

For instance:
Creating a halfvec vector:

from vecs import halfvec
vec = halfvec([1.0, 2.0, 3.0])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions