Skip to content

Replace placeholder embeddings with real vector search #9

@BunsDev

Description

@BunsDev

Problem

lib/opentrust/semantic.ts currently uses a simple character-code hashing function to generate "embeddings," which provides no meaningful semantic similarity. The sqlite-vec dependency is installed but underutilized because the vectors it stores aren't semantically meaningful.

Proposed Solution

Integrate a real embedding model for semantic search:

Option A: Local ONNX Model (Preferred)

  • Use a small model like all-MiniLM-L6-v2 via onnxruntime-node
  • ~80MB model file, runs locally with no API dependency
  • Maintains the local-first philosophy

Option B: API-Based Embeddings

  • Use OpenAI, Cohere, or Anthropic embeddings API
  • Smaller bundle, but adds external dependency and latency
  • Requires API key configuration

Acceptance Criteria

  • embedText() produces semantically meaningful vectors
  • Similar content produces similar vectors (verified with test cases)
  • searchSemanticFallback() returns relevant results ranked by similarity
  • Embedding dimension matches sqlite-vec column configuration
  • scripts/index-semantic.ts rebuilds the index with real embeddings
  • Performance: embedding a chunk takes < 100ms locally
  • Documented setup instructions for the chosen approach

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceSpeed and resource optimization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions