Skip to content

feat(semantic): OpenAI-compatible remote embedding adapter #324

@esengine

Description

@esengine

Driven by discussion #310 — local Ollama on a CPU-only VM is too slow for first-time indexing, and pointing OLLAMA_URL at a remote box still requires running Ollama somewhere. A real third-party endpoint is the next step.

Why

Current state (src/index/semantic/embedding.ts):

  • Hardcoded to Ollama's /api/embeddings shape.
  • OLLAMA_URL env var supports a remote Ollama daemon but not non-Ollama providers.
  • probeOllama similarly Ollama-specific.

Gemini, OpenAI, vLLM, llama.cpp's server, LM Studio, and most third-party providers all expose POST /v1/embeddings with the same request/response shape. One adapter covers them all.

Scope

One adapter, OpenAI-compatible. No per-vendor code paths.

  • New env vars (or config keys):
    • REASONIX_EMBED_PROVIDERollama (default) | openai-compat
    • REASONIX_EMBED_BASE_URL — full base URL when provider is openai-compat (e.g. https://generativelanguage.googleapis.com/v1beta/openai, https://api.openai.com/v1, http://localhost:8000/v1)
    • REASONIX_EMBED_API_KEY — bearer token, sent as Authorization: Bearer <key>
    • REASONIX_EMBED_MODEL already exists — reused as the model name passed to the API
  • Ollama remains the default. Zero behavior change for existing users.
  • Adapter dispatch hidden behind the existing embed() / embedAll() / probe*() surface; callers see one API.

Constraints

  • Vector dimensions per store. Different models output different dims (Ollama nomic-embed-text = 768, OpenAI text-embedding-3-small = 1536, Gemini text-embedding-004 = 768). The vector store must refuse mixing embeddings from different model identities — index header records { provider, model, dim } and rejects appends that don't match. Caller can rebuild the index when switching providers.
  • No "free tier" framing in docs. Document that an OpenAI-compatible endpoint can be Gemini / OpenAI / etc., without promising free. Free tiers move on someone else's schedule.
  • Errors stay actionable. ECONNREFUSED keeps Ollama install hint when provider is Ollama; OpenAI-compat path needs its own "check your API key / base URL" hint on 401 / 404.

Non-goals

  • Per-vendor adapters (Gemini-specific, OpenAI-specific). One OpenAI-compat shape, period.
  • Auto-migrating an existing Ollama-built index to a new provider. User rebuilds.
  • Embedding-cache schema changes. Cache key already includes the model name; extending to include provider is straightforward but separable.

Acceptance

  • REASONIX_EMBED_PROVIDER=openai-compat + REASONIX_EMBED_BASE_URL + REASONIX_EMBED_API_KEY produces a working embedder that can build an index via reasonix semantic build.
  • Default behavior (no env vars set) is byte-identical to today.
  • Vector store rejects appending vectors from a different { provider, model } than the index was built with.
  • Test fixture: a fake OpenAI-compat endpoint (vitest vi.fn-driven fetch) drives the adapter through one happy-path embed and one 401 error.
  • README / docs include one short paragraph on the env vars + a single example pointing at api.openai.com/v1 (no "free" framing).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions