Driven by discussion #310 — local Ollama on a CPU-only VM is too slow for first-time indexing, and pointing OLLAMA_URL at a remote box still requires running Ollama somewhere. A real third-party endpoint is the next step.
Why
Current state (src/index/semantic/embedding.ts):
- Hardcoded to Ollama's
/api/embeddings shape.
OLLAMA_URL env var supports a remote Ollama daemon but not non-Ollama providers.
probeOllama similarly Ollama-specific.
Gemini, OpenAI, vLLM, llama.cpp's server, LM Studio, and most third-party providers all expose POST /v1/embeddings with the same request/response shape. One adapter covers them all.
Scope
One adapter, OpenAI-compatible. No per-vendor code paths.
- New env vars (or config keys):
REASONIX_EMBED_PROVIDER — ollama (default) | openai-compat
REASONIX_EMBED_BASE_URL — full base URL when provider is openai-compat (e.g. https://generativelanguage.googleapis.com/v1beta/openai, https://api.openai.com/v1, http://localhost:8000/v1)
REASONIX_EMBED_API_KEY — bearer token, sent as Authorization: Bearer <key>
REASONIX_EMBED_MODEL already exists — reused as the model name passed to the API
- Ollama remains the default. Zero behavior change for existing users.
- Adapter dispatch hidden behind the existing
embed() / embedAll() / probe*() surface; callers see one API.
Constraints
- Vector dimensions per store. Different models output different dims (Ollama
nomic-embed-text = 768, OpenAI text-embedding-3-small = 1536, Gemini text-embedding-004 = 768). The vector store must refuse mixing embeddings from different model identities — index header records { provider, model, dim } and rejects appends that don't match. Caller can rebuild the index when switching providers.
- No "free tier" framing in docs. Document that an OpenAI-compatible endpoint can be Gemini / OpenAI / etc., without promising free. Free tiers move on someone else's schedule.
- Errors stay actionable. ECONNREFUSED keeps Ollama install hint when provider is Ollama; OpenAI-compat path needs its own "check your API key / base URL" hint on 401 / 404.
Non-goals
- Per-vendor adapters (Gemini-specific, OpenAI-specific). One OpenAI-compat shape, period.
- Auto-migrating an existing Ollama-built index to a new provider. User rebuilds.
- Embedding-cache schema changes. Cache key already includes the model name; extending to include provider is straightforward but separable.
Acceptance
REASONIX_EMBED_PROVIDER=openai-compat + REASONIX_EMBED_BASE_URL + REASONIX_EMBED_API_KEY produces a working embedder that can build an index via reasonix semantic build.
- Default behavior (no env vars set) is byte-identical to today.
- Vector store rejects appending vectors from a different
{ provider, model } than the index was built with.
- Test fixture: a fake OpenAI-compat endpoint (vitest
vi.fn-driven fetch) drives the adapter through one happy-path embed and one 401 error.
- README / docs include one short paragraph on the env vars + a single example pointing at
api.openai.com/v1 (no "free" framing).
Driven by discussion #310 — local Ollama on a CPU-only VM is too slow for first-time indexing, and pointing
OLLAMA_URLat a remote box still requires running Ollama somewhere. A real third-party endpoint is the next step.Why
Current state (
src/index/semantic/embedding.ts):/api/embeddingsshape.OLLAMA_URLenv var supports a remote Ollama daemon but not non-Ollama providers.probeOllamasimilarly Ollama-specific.Gemini, OpenAI, vLLM, llama.cpp's server, LM Studio, and most third-party providers all expose
POST /v1/embeddingswith the same request/response shape. One adapter covers them all.Scope
One adapter, OpenAI-compatible. No per-vendor code paths.
REASONIX_EMBED_PROVIDER—ollama(default) |openai-compatREASONIX_EMBED_BASE_URL— full base URL when provider isopenai-compat(e.g.https://generativelanguage.googleapis.com/v1beta/openai,https://api.openai.com/v1,http://localhost:8000/v1)REASONIX_EMBED_API_KEY— bearer token, sent asAuthorization: Bearer <key>REASONIX_EMBED_MODELalready exists — reused as the model name passed to the APIembed()/embedAll()/probe*()surface; callers see one API.Constraints
nomic-embed-text= 768, OpenAItext-embedding-3-small= 1536, Geminitext-embedding-004= 768). The vector store must refuse mixing embeddings from different model identities — index header records{ provider, model, dim }and rejects appends that don't match. Caller can rebuild the index when switching providers.Non-goals
Acceptance
REASONIX_EMBED_PROVIDER=openai-compat+REASONIX_EMBED_BASE_URL+REASONIX_EMBED_API_KEYproduces a working embedder that can build an index viareasonix semantic build.{ provider, model }than the index was built with.vi.fn-driven fetch) drives the adapter through one happy-path embed and one 401 error.api.openai.com/v1(no "free" framing).