Cool Vector DB

Overview

A high-performance, minimalist vector database implementation featuring custom binary storage, memory-mapped I/O, and HNSW-based approximate nearest neighbor search. This project is built with FastAPI for the API layer and PostgreSQL for metadata and user management.

Key Features

Custom binary vector storage with 36-byte header and O(1) random access.
Memory-mapped (mmap) vector retrieval for zero-copy performance.
HNSW (Hierarchical Navigable Small World) indexing for fast approximate kNN search.
JWT-based authentication for secure multi-user access.
Google Gemini embedding integration with fallback to deterministic local embedders.
Automated API testing via a comprehensive Postman collection.

Architecture

The system is divided into three primary layers:

API Layer: FastAPI handles routing, JWT authentication, and request/response validation.
Storage Layer: A custom binary store handles the raw vector data (vectors.bin) and search indexing (hnsw.index).
Metadata Layer: PostgreSQL stores user profiles, preferences, and text metadata associated with each vector ID.

Storage Specification (`vectors.bin`)

Vectors are stored in a flat binary file with a persistent 36-byte header.

Header Anatomy

Offset	Size	Field	Type	Description
0-7	8 B	magic	bytes	`b'COOLVEC\x00'`
8-11	4 B	version	uint32	`1`
12-19	8 B	n_vecs	uint64	Current vector count
20-27	8 B	dim	uint64	Vector dimensionality (e.g., 3072)
28-31	4 B	dtype	uint32	`0` (float32)
32-35	4 B	capacity	uint32	Maximum vector capacity

Vector Data Region

Starting at offset 36, vectors are stored contiguously without padding. The byte offset for vector i is calculated as: 36 + i * dim * 4.

Getting Started

Prerequisites

Python 3.10+
PostgreSQL
Google API Key (optional, for Gemini embeddings)

Installation

# Clone the repository
git clone <repository-url>
cd vec-db-v0

# Install dependencies
make install

Configuration

Create a .env file in the root directory:

DB_URI=postgresql://user:password@localhost:5432/mydb
GOOGLE_API_KEY=your_google_api_key
SECRET_KEY=your_jwt_secret_key
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30
VEC_DB_PATH=./data

Initialization

make init

Development Scripts

The included Makefile provides shortcuts for common operations:

make dev: Start the FastAPI server with hot-reloading on 127.0.0.1:8000.
make start: Start the server for production deployment.
make init: Create required PostgreSQL tables.
make test: Run unit and integration tests for the storage layer.
make clean: Delete local vector data (./data directory).
make migrate: Reset the metadata schema (destructive).

API Endpoints

Authentication

POST `/user/signup`

Creates a new user account.

Body: {"username": "...", "password": "..."}

POST `/user/login`

Authenticates user and returns a JWT token.

Body: username=...&password=... (Form Data)

Vector Operations

POST `/vec/add`

Embeds text and stores the resulting vector.

Headers: Authorization: Bearer <token>
Body: {"text": "Your document content here"}

POST `/vec/similarity`

Finds the top-k most similar vectors to a specified vector ID.

Headers: Authorization: Bearer <token>
Query Params: vec_id=(int), k=(int, default 5)
Response: Sorted list of results with IDs, distances, and text.

POST `/vec/distance`

Calculates the exact distance between two vector IDs using the user's preferred metric.

Headers: Authorization: Bearer <token>
Query Params: id1=(int), id2=(int)

DELETE `/vec/{vec_id}`

Removes a vector's metadata. Note: The vector slot in binary storage is currently marked as inactive by removing metadata link.

Examples

Adding a Vector

curl -X POST "http://127.0.0.1:8000/vec/add" \
     -H "Authorization: Bearer <your_token>" \
     -H "Content-Type: application/json" \
     -d '{"text": "The quick brown fox jumps over the lazy dog."}'

Similarity Search

curl -X POST "http://127.0.0.1:8000/vec/similarity?vec_id=0&k=3" \
     -H "Authorization: Bearer <your_token>"

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
api		api
docs		docs
vec_db		vec_db
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cool Vector DB

Overview

Key Features

Architecture

Storage Specification (`vectors.bin`)

Header Anatomy

Vector Data Region

Getting Started

Prerequisites

Installation

Configuration

Initialization

Development Scripts

API Endpoints

Authentication

POST `/user/signup`

POST `/user/login`

Vector Operations

POST `/vec/add`

POST `/vec/similarity`

POST `/vec/distance`

DELETE `/vec/{vec_id}`

Examples

Adding a Vector

Similarity Search

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cool Vector DB

Overview

Key Features

Architecture

Storage Specification (vectors.bin)

Header Anatomy

Vector Data Region

Getting Started

Prerequisites

Installation

Configuration

Initialization

Development Scripts

API Endpoints

Authentication

POST /user/signup

POST /user/login

Vector Operations

POST /vec/add

POST /vec/similarity

POST /vec/distance

DELETE /vec/{vec_id}

Examples

Adding a Vector

Similarity Search

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Storage Specification (`vectors.bin`)

POST `/user/signup`

POST `/user/login`

POST `/vec/add`

POST `/vec/similarity`

POST `/vec/distance`

DELETE `/vec/{vec_id}`