feat(rag): add pgvector vector store backend by Avi-47 · Pull Request #1285 · mofa-org/mofa

Avi-47 · 2026-03-16T04:28:56Z

Overview

This PR adds a new pgvector-backed VectorStore implementation for the MoFA RAG pipeline. The implementation uses PostgreSQL with the pgvector extension to provide persistent vector storage with similarity search capabilities.

Related Work

This PR is part of a series of improvements to MoFA's RAG pipeline.

Related issue:

[GSoC 2026] feat: Enhanced RAG Pipeline with Advanced Retrieval Strategies #305 — Enhanced RAG Pipeline with Advanced Retrieval Strategies

Related PRs:

# feat(rag): add BM25 sparse retrieval module #1266 — BM25 sparse retrieval module
feat(rag): add hybrid dense + sparse retrieval pipeline #1283 — Hybrid dense + sparse retrieval pipeline

Together these changes extend MoFA's RAG system with:

sparse retrieval
hybrid search
additional vector store backends

Motivation

The MoFA framework currently supports Qdrant as a vector store backend. Adding pgvector support provides users with more flexibility to use PostgreSQL - a widely-adopted database that many organizations already have in their infrastructure. This enables:

Hybrid storage: Store both structured data and vectors in a single PostgreSQL instance
Existing infrastructure: Leverage existing PostgreSQL deployments without deploying additional services
Simpler operations: Reduce the number of moving parts in the system

Architecture

The implementation follows the same design pattern as the existing QdrantVectorStore:

┌─────────────────────────────────────────────────────────────┐
│                    mofa-foundation                          │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  rag/storage/pgvector.rs (PgVectorStore)            │    │
│  │  - implements VectorStore trait                     │    │
│  │  - uses tokio-postgres client                       │    │
│  │  - uses pgvector crate for vector type              │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────────┐
│                    mofa-kernel                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  rag/vector_store.rs (VectorStore trait)            │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Key Components

PgVectorConfig: Configuration struct for connection string, table name, and vector dimensions
PgVectorStore: Main implementation implementing the VectorStore trait

Database Schema:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE rag_embeddings (
    id TEXT PRIMARY KEY,
    content TEXT NOT NULL,
    embedding vector(n) NOT NULL,
    metadata TEXT DEFAULT '{}',
    created_at TIMESTAMP DEFAULT NOW()
);

Implementation Details

Connection: Uses tokio-postgres Client directly
Upsert: Uses PostgreSQL's ON CONFLICT for upsert semantics
Search: Uses euclidean distance via the <-> operator
Index Management: Drops any existing index on table creation to ensure correct search results with small datasets. For production use with larger datasets, an IVFFlat index can be created manually.
ID Handling: Uses TEXT for IDs directly

Usage

Enable the feature in Cargo.toml:

[dependencies]
mofa-foundation = { version = "0.1", features = ["pgvector"] }

Example usage:

use mofa_foundation::rag::{PgVectorConfig, PgVectorStore, VectorStore, DocumentChunk};

let config = PgVectorConfig::new(
    "host=localhost port=5432 user=postgres password=secret dbname=vectordb",
    "my_embeddings",  // table name
    1536,             // vector dimensions
).with_create_table(true);

let mut store = PgVectorStore::new(config).await?;

// Add documents
let chunk = DocumentChunk::new("doc-1", "Hello world", vec![0.1; 1536]);
store.upsert(chunk).await?;

// Search
let results = store.search(&query_embedding, 5, None).await?;

Testing Instructions

1. Start PostgreSQL with pgvector

Using Docker:

docker run -d --name pgvector -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=rag -p 5432:5432 pgvector/pgvector:pg16

Or using Docker Compose:

services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: vectordb
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

2. Run integration tests (requires database)

# First ensure PostgreSQL with pgvector is running
# Then run the integration tests
cargo test -p mofa-foundation --features pgvector --test pgvector_store_test -- --ignored

Test Results

All pgvector tests passed locally.


cargo test -p mofa-foundation --features pgvector --test pgvector_store_test -- --ignored

Output:


running 5 tests
test tests::test_pgvector_similarity_metric ... ok
test tests::test_pgvector_delete ... ok
test tests::test_pgvector_insert_and_search ... ok
test tests::test_pgvector_clear ... ok
test tests::test_pgvector_batch_insert ... ok

test result: ok. 5 passed; 0 failed

3. Run a simple similarity query test

Create a test script:

use mofa_foundation::rag::{PgVectorConfig, PgVectorStore, VectorStore, DocumentChunk};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = PgVectorConfig::new(
        "host=localhost port=5432 user=postgres password=postgres dbname=vectordb",
        "test_embeddings",
        3,
    ).with_create_table(true);

    let mut store = PgVectorStore::new(config).await?;

    // Insert test documents
    let doc1 = DocumentChunk::new("doc-1", "The quick brown fox", vec![0.1, 0.2, 0.3]);
    let doc2 = DocumentChunk::new("doc-2", "A fast dog", vec![0.2, 0.3, 0.4]);
    store.upsert(doc1).await?;
    store.upsert(doc2).await?;

    // Search
    let query = vec![0.15, 0.25, 0.35];
    let results = store.search(&query, 2, None).await?;

    for result in results {
        println!("ID: {}, Score: {:.4}", result.id, result.score);
    }

    Ok(())
}

Dependencies Added

tokio-postgres = "0.7" - Async PostgreSQL client
pgvector = "0.2" - Rust types for pgvector

Files Changed

crates/mofa-foundation/Cargo.toml - Added pgvector feature and dependencies
crates/mofa-foundation/src/rag/mod.rs - Added storage module export
crates/mofa-foundation/src/rag/storage/pgvector.rs - New implementation
crates/mofa-foundation/tests/pgvector_store_test.rs - New test file

Backward Compatibility

This change is fully backward compatible. The pgvector feature is optional and disabled by default. Existing code using other vector stores (Qdrant, InMemory) is unaffected.

Related Work

Part of Task #27 – RAG Pipeline improvements.

Future work may include:

Hybrid dense + sparse retrieval
Additional vector store backends
metadata filtering

Avi-47 · 2026-03-16T09:40:00Z

Hi @lijingrs @yangrudan,
This PR adds a pgvector backend implementing the VectorStore trait for the RAG pipeline. It is feature-gated (pgvector) and follows the same pattern as the existing Qdrant implementation.
CI checks are passing and integration tests are included.
Happy to adjust anything if there are preferred design patterns for vector store backends.
Would appreciate your review whenever you have time. Thanks!

Avi-47 marked this pull request as ready for review March 16, 2026 04:29

Avi-47 force-pushed the feat/rag-pgvector-backend branch 4 times, most recently from 0aa9d4f to 215cbc0 Compare March 16, 2026 08:01

feat(rag): add pgvector vector store backend

9c02962

Avi-47 force-pushed the feat/rag-pgvector-backend branch from 215cbc0 to 9c02962 Compare March 16, 2026 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag): add pgvector vector store backend#1285

feat(rag): add pgvector vector store backend#1285
Avi-47 wants to merge 1 commit intomofa-org:mainfrom
Avi-47:feat/rag-pgvector-backend

Avi-47 commented Mar 16, 2026 •

edited

Loading

Uh oh!

Avi-47 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Avi-47 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Related Work

Motivation

Architecture

Key Components

Implementation Details

Usage

Enable the feature in Cargo.toml:

Example usage:

Testing Instructions

1. Start PostgreSQL with pgvector

2. Run integration tests (requires database)

Test Results

3. Run a simple similarity query test

Dependencies Added

Files Changed

Backward Compatibility

Related Work

Uh oh!

Avi-47 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Avi-47 commented Mar 16, 2026 •

edited

Loading