Skip to content

rushikeshg25/cool-vec

Repository files navigation

Cool Vector DB

Overview

A high-performance, minimalist vector database implementation featuring custom binary storage, memory-mapped I/O, and HNSW-based approximate nearest neighbor search. This project is built with FastAPI for the API layer and PostgreSQL for metadata and user management.

Key Features

  • Custom binary vector storage with 36-byte header and O(1) random access.
  • Memory-mapped (mmap) vector retrieval for zero-copy performance.
  • HNSW (Hierarchical Navigable Small World) indexing for fast approximate kNN search.
  • JWT-based authentication for secure multi-user access.
  • Google Gemini embedding integration with fallback to deterministic local embedders.
  • Automated API testing via a comprehensive Postman collection.

Architecture

The system is divided into three primary layers:

  1. API Layer: FastAPI handles routing, JWT authentication, and request/response validation.
  2. Storage Layer: A custom binary store handles the raw vector data (vectors.bin) and search indexing (hnsw.index).
  3. Metadata Layer: PostgreSQL stores user profiles, preferences, and text metadata associated with each vector ID.

Storage Specification (vectors.bin)

Vectors are stored in a flat binary file with a persistent 36-byte header.

Header Anatomy

Offset Size Field Type Description
0-7 8 B magic bytes b'COOLVEC\x00'
8-11 4 B version uint32 1
12-19 8 B n_vecs uint64 Current vector count
20-27 8 B dim uint64 Vector dimensionality (e.g., 3072)
28-31 4 B dtype uint32 0 (float32)
32-35 4 B capacity uint32 Maximum vector capacity

Vector Data Region

Starting at offset 36, vectors are stored contiguously without padding. The byte offset for vector i is calculated as: 36 + i * dim * 4.

Getting Started

Prerequisites

  • Python 3.10+
  • PostgreSQL
  • Google API Key (optional, for Gemini embeddings)

Installation

# Clone the repository
git clone <repository-url>
cd vec-db-v0

# Install dependencies
make install

Configuration

Create a .env file in the root directory:

DB_URI=postgresql://user:password@localhost:5432/mydb
GOOGLE_API_KEY=your_google_api_key
SECRET_KEY=your_jwt_secret_key
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30
VEC_DB_PATH=./data

Initialization

make init

Development Scripts

The included Makefile provides shortcuts for common operations:

  • make dev: Start the FastAPI server with hot-reloading on 127.0.0.1:8000.
  • make start: Start the server for production deployment.
  • make init: Create required PostgreSQL tables.
  • make test: Run unit and integration tests for the storage layer.
  • make clean: Delete local vector data (./data directory).
  • make migrate: Reset the metadata schema (destructive).

API Endpoints

Authentication

POST /user/signup

Creates a new user account.

  • Body: {"username": "...", "password": "..."}

POST /user/login

Authenticates user and returns a JWT token.

  • Body: username=...&password=... (Form Data)

Vector Operations

POST /vec/add

Embeds text and stores the resulting vector.

  • Headers: Authorization: Bearer <token>
  • Body: {"text": "Your document content here"}

POST /vec/similarity

Finds the top-k most similar vectors to a specified vector ID.

  • Headers: Authorization: Bearer <token>
  • Query Params: vec_id=(int), k=(int, default 5)
  • Response: Sorted list of results with IDs, distances, and text.

POST /vec/distance

Calculates the exact distance between two vector IDs using the user's preferred metric.

  • Headers: Authorization: Bearer <token>
  • Query Params: id1=(int), id2=(int)

DELETE /vec/{vec_id}

Removes a vector's metadata. Note: The vector slot in binary storage is currently marked as inactive by removing metadata link.

Examples

Adding a Vector

curl -X POST "http://127.0.0.1:8000/vec/add" \
     -H "Authorization: Bearer <your_token>" \
     -H "Content-Type: application/json" \
     -d '{"text": "The quick brown fox jumps over the lazy dog."}'

Similarity Search

curl -X POST "http://127.0.0.1:8000/vec/similarity?vec_id=0&k=3" \
     -H "Authorization: Bearer <your_token>"

About

Simple Vector DB implementation

Resources

Stars

Watchers

Forks

Contributors