feat(llm): chat-only OpenAI-compatible provider behind MCP_LLM_PROVIDER flag (PoC)#859
Conversation
Cree apps/api/src/services/llm/types.ts avec : - LLMProviderType discriminant (anthropic | openai-compatible) - ChatMessage, LLMStreamEvent, LLMProvider interface (signature seulement) - OpenAISession : session legere sans SDK Query pour le chemin OpenAI Aucune implementation, aucun fichier existant modifie. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
Ajoute dans envSchema (validate.ts) : - MCP_LLM_PROVIDER : enum 'anthropic' | 'openai-compatible', defaut 'anthropic' - MCP_LLM_BASE_URL, MCP_LLM_API_KEY, MCP_LLM_MODEL : optionnels - MCP_LLM_PRICE_INPUT_PER_M_USD, MCP_LLM_PRICE_OUTPUT_PER_M_USD : defaut 0 - Validation croisee : MCP_LLM_BASE_URL requis si provider = openai-compatible Le flag est absent ou 'anthropic' par defaut : comportement Anthropic inchange. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
historyBuilder.ts : - buildMessagesFromHistory() charge ai_messages et mappe en ChatMessage[] - ToolUseInHistoryError si tool_use/tool_result present (refus explicite, pas de filtrage silencieux) openaiCompatibleProvider.ts : - fetch natif Node, parsing SSE manuel, pas de dep openai - Aucun champ tools/tool_choice envoye a vLLM - Detection defensive tool_calls dans la reponse -> error event, arret immediat - computeCostUsd() pour le cost tracking best-effort - Note : vLLM n'a pas d'equivalent au prompt caching Anthropic Aucun fichier existant modifie. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
openaiSessionManager.ts : - OpenAISessionManager : Map de sessions legeres (pas de SDK Query) - getOrCreate, tryTransitionToProcessing, startTurn, remove, eviction TTL - runTurn : charge historique, stream vLLM, publie events eventBus, sauve message assistant, enregistre cout, publie done - Constantes TTL copiees de streamingSessionManager.ts (2h idle, 24h max) aiCostTracker.ts : - Ajout recordOpenAIUsage() pour le cost tracking best-effort OpenAI - Note : pas d'equivalent prompt caching sur vLLM ai.ts (route) : - Imports lazy singleton OpenAISessionManager / OpenAICompatibleProvider - Early-return OpenAI-compatible avant le bloc Anthropic existant - Chemin Anthropic : identique, inchange https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
openaiCompatibleProvider.ts : - B2 : reader.cancel() dans le finally pour liberer le socket promptement - B3 : timeout fetch 6 min via AbortSignal.any + clearTimeout a chaque sortie anticipee - M1 : split SSE sur \r?\n\r?\n pour robustesse proxy - M2 : detection finish_reason === 'tool_calls' en plus de delta.tool_calls openaiSessionManager.ts : - M3 : abort de l'ancien controller avant remplacement dans startTurn - M4 : commentaire explicitant pourquoi le finally Anthropic n'est pas copie ai.ts : - M5 : guard explicite sur MCP_LLM_BASE_URL au lieu du non-null assertion Note B1 : totalCostCents est de type real (pas integer) en DB, la formule Math.round(costUsd * 100 * 100) / 100 est identique a recordUsageFromSdkResult, aucune correction necessaire. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
9 cas couverts : - historique vide -> tableau vide - messages user+assistant mappes correctement - messages system ignores silencieusement - assistant null (tour pure tool-use) ignore - user null ignore - tool_use dans historique -> ToolUseInHistoryError - tool_result dans historique -> ToolUseInHistoryError - message d'erreur contient le sessionId - ordre des messages preserve https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
Repertoire local pour scripts de dev non versionnés (smoke tests, etc.). Mentionne dans la description du PR pour que le mainteneur tranche sur le versionning eventuel. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
La colonne ai_sessions reste orientée Anthropic ; le chemin vLLM refuse les ids Claude. Exiger BASE_URL, MODEL et API_KEY au boot évite une dégradation silencieuse. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Thanks for the PoC, @Emilien-Etadam — scope is unusually clean for 1,000+ lines: pure-additive new directory, real test coverage on the non-trivial logic (historyBuilder + config validation), default Anthropic path genuinely untouched. The branch at A few things I'd like addressed before this moves out of draft. Blocker (even as draft, before this lands behind the flag): The Stop button silently fails on the openai-compatible path. const manager = getConfig().MCP_LLM_PROVIDER === 'openai-compatible'
? getOpenAISessionManager()
: streamingSessionManager;
result = await manager.interrupt(sessionId);Same shape for Worth fixing in this PR:
Failed turns leave the per-session Before un-flagging in production (not blockers for landing the draft):
CLAUDE.md compliance: clean. Once the Stop-button routing is fixed I'm comfortable with this landing as a flag-gated experimental path. Happy to keep iterating with you on the un-flag prerequisites. |
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…tarts Co-authored-by: Cursor <cursoragent@cursor.com>
|
@Emilien-Etadam — read through this carefully and it's a well-built PoC. Strong defensive design: tool-call rejection covers both Things that read as correct:
Worth flagging (small):
Re: your open questions:
One unverified thing I can't speak to:
Thanks for the PoC — solid foundation if it goes in. |
Stop no longer yields an error event or surfaces as internal error; non-abort stream failures still emit a typed error.
Classify combined abort by timeout vs caller signal so 6m stalls error while Stop still ends the stream without a provider error event.
Rely on Zod-validated config instead of falling back to an empty bearer token. Co-authored-by: Cursor <cursoragent@cursor.com>
Document that array-shaped delta.content from some backends is ignored today. Co-authored-by: Cursor <cursoragent@cursor.com>
Move the smoke test out of ignored .devscripts into the api package sources for repeatable runs in review. Co-authored-by: Cursor <cursoragent@cursor.com>
Nested withDbAccessContext was skipping set_config while the auth middleware’s ALS was still active, breaking RLS on ai_cost_usage. Mirror StreamingSessionManager. Co-authored-by: Cursor <cursoragent@cursor.com>
Close now evicts in-memory state from OpenAISessionManager when using the OpenAI-compatible provider, matching the Anthropic path and interrupt routing. Co-authored-by: Cursor <cursoragent@cursor.com>
Attach a rejection handler mirroring StreamingSessionManager’s processorPromise .catch so stray failures surface in Sentry and logs. Co-authored-by: Cursor <cursoragent@cursor.com>
runTurn reads committed history after runOutsideDbContext; pending user rows were invisible — pass sanitized content separately like Anthropic pushMessage. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
Pushed the fixes — all four blockers plus the Stop/timeout cleanup are done, @bdunncompany thanks for the careful pass — your nits are all addressed No rush on review. |
|
@Emilien-Etadam thanks for the careful follow-up. Walked the 14 new commits, they map cleanly to the asks: Blockers — all 4 addressed:
Robustness — Stop vs timeout distinction ( The two new RLS fixes are the most valuable adds from this round, and the fact that they only surfaced once you tested with
Nits — all four landed: API key non-null assertion, multipart drop comment, smoke script versioned path, zero-price startup warning. LGTM from a fellow contributor. Leaving the merge decision to @ToddHebebrand; flagging that the "known limitations" list in the description is honest about what's deferred (provider unit tests, mid-stream retry, BASE_URL allowlist, idle eviction DB drift) so reviewers can size the follow-up issues separately from the merge gate. No rush on my end either — this is solid groundwork for #505 phase 1. |
Commits the approved 2026-06-07 brainstorm spec for per-process resource drill-down on the device Performance screen, and the living e2e coverage index that the e2e-coverage skill reads/appends per sweep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolves the validate.ts / validate.test.ts overlap: the MCP_LLM_* config block independently landed on main via LanternOps#864, so validate.ts takes main's version wholesale; validate.test.ts keeps both this PR's MCP_LLM config tests and main's newer test blocks.
getConfig() throws before validateConfig() runs, and route unit tests never validate config — so every guard added for the openai-compatible path threw 500s in ai route tests (latent since the PR ran as a draft with no CI). isOpenAICompatibleProvider() treats an unvalidated config as the default anthropic path; production validates at boot, so behavior there is unchanged.
|
@Emilien-Etadam — first, an apology. You addressed every blocker on May 24, bdunncompany confirmed the fixes commit-by-commit the same day, and then this sat for 18 days with the ball squarely in my court. That's on me: my review queue kept bucketing it as an author-blocked draft, and the "merge decision to Todd" handoff in the comments never overrode the draft flag. I've fixed that triage rule so a thread handoff outranks draft status. Final review: nothing new owed on the code. A fresh pass against today's Merge prep I pushed to your branch (hope you don't mind — wanted to spare you another round after the wait): marked ready-for-review and pushed two commits. Follow-up scope stays as the honest known-limitations list in your description (provider unit tests, mid-stream retry, BASE_URL allowlist, idle-eviction DB drift), plus one my re-review added: single-turn budget overshoot — usage is recorded after the turn and Merging now that CI is green. The two RLS catches you found by testing with the Status: merged (admin-squash) once CI completes; follow-up issues to be filed at graduation time. |
|
No apology needed, thanks for the merge prep and the CI fix.
Happy to pick up the follow-ups at graduation; glad to discuss the phase-2 tool-calling sequencing sooner whenever suits you.
Le 12 juin 2026 00:58:44 GMT+02:00, Todd Hebebrand ***@***.***> a écrit :
…ToddHebebrand left a comment (LanternOps/breeze#859)
@Emilien-Etadam — first, an apology. You addressed every blocker on May 24, bdunncompany confirmed the fixes commit-by-commit the same day, and then this sat for 18 days with the ball squarely in my court. That's on me: my review queue kept bucketing it as an author-blocked draft, and the "merge decision to Todd" handoff in the comments never overrode the draft flag. I've fixed that triage rule so a thread handoff outranks draft status.
**Final review:** nothing new owed on the code. A fresh pass against today's `main` (which has moved a lot since your base — AI tool site-scope enforcement, PAM tool-action flows, M365 agent) came back clean: those features gate *tool execution*, which this path doesn't do by design, and `recordOpenAIUsage` still matches the current cost/billing chokepoints exactly.
**Merge prep I pushed to your branch** (hope you don't mind — wanted to spare you another round after the wait): marked ready-for-review and pushed two commits. `8311931` merges `main` in — the one real conflict was self-inflicted on our side: the `MCP_LLM_*` config block landed on `main` independently via #864 (`f891253e`), so `apps/api/src/config/validate.ts` takes main's version wholesale and `validate.test.ts` keeps both your config tests and main's newer blocks. `22784d1` fixes one latent bug the first-ever CI run on this PR surfaced (drafts here never ran CI, so nobody could have seen it): the provider guards call `getConfig()` inside route handlers, which throws before `validateConfig()` runs — route unit tests never validate config, so the pause/approve/reject guards 500'd eleven ai-route tests. The fix is a tolerant `isOpenAICompatibleProvider()` that treats an unvalidated config as the default anthropic path; production validates at boot, so no behavior change there. 173 tests green locally plus a clean typecheck on the merged tree.
**Follow-up scope** stays as the honest known-limitations list in your description (provider unit tests, mid-stream retry, BASE_URL allowlist, idle-eviction DB drift), plus one my re-review added: single-turn budget overshoot — usage is recorded after the turn and `checkBudget` only gates the next turn's preflight, so one openai-compatible turn can exceed the remaining budget. All acceptable for an off-by-default flagged path; we'll size them as issues when this graduates from PoC.
Merging now that CI is green. The two RLS catches you found by testing with the `breeze_app` role active were exactly the kind of pre-merge find this review process exists for — genuinely good work, and a model follow-up round. Sorry again for the silence.
**Status:** merged (admin-squash) once CI completes; follow-up issues to be filed at graduation time.
--
Reply to this email directly or view it on GitHub:
#859 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
Draft PoC for the alternative LLM backend from #505. Adds a feature-flagged,
chat-only, OpenAI-compatible provider path (target: local vLLM) alongside the
existing Anthropic path.
Scope as agreed: Anthropic stays the source of truth, the new path is
best-effort with no SLA, off by default, tool-calling out of scope here.
What it does
MCP_LLM_PROVIDER=anthropic(default) |openai-compatible. Absent oranthropicleaves behavior unchanged.apps/api/src/services/llm/:openaiCompatibleProvider.ts: nativefetch+ manual SSE, noopenaidependency. Sends no
tools; if the model returnstool_callsanyway,yields an error and stops (no silent fallback).
openaiSessionManager.ts: no long-lived subprocess. Each turn is anindependent HTTP stream; sessions persist only for eventBus + TTL eviction.
historyBuilder.ts: rebuilds context fromai_messages(no SDKresumeequivalent). Refuses sessions containing tool-use messages rather than
silently dropping context.
openai-compatible,MCP_LLM_BASE_URL,MCP_LLM_MODEL,MCP_LLM_API_KEYare required or the server refuses tostart.
prompt-caching equivalent on vLLM (noted in code).
Untouched (intentional)
Tool-calling / MCP surface, Script Builder, Helper flow,
aiGuardrails,aiTools, risk tiers,streamingSessionManager, and the Anthropic flow.Testing
Local vLLM serving Qwen3, Node 22, Postgres 17, non-Docker, with RLS enforced
(breeze_app role active).
explicit Zod error.
(history correctly carried across turns).
local model in the existing chat UI.
historyBuilder(9),validaterules (4). Pass.Changes since review
Thanks for the detailed reviews. All blockers are addressed; the path now runs
end-to-end against a local vLLM with RLS enforced, validated across multi-turn
conversations.
Blockers (maintainer)
(was hardcoded to streamingSessionManager).
this path instead of a misleading 404.
the single source of that event.
(success, 5xx, abort), not on upstream refusals like ToolUseInHistoryError.
Billing stays success-only. turnCount is the anti-loop guard; token billing
is a separate metric.
Robustness
error; a genuine 6-minute timeout is now distinguished from a user Stop and
does surface as an error.
(same routing fix as Stop).
Anthropic processorPromise.catch.
Additional fixes from end-to-end testing with RLS active
request transaction context and failed the WITH CHECK policy. Fixed by
starting the async turn outside the request ALS via runOutsideDbContextSafe,
mirroring the Anthropic path. (This was masked while developing without the
breeze_app role.)
runTurn rebuilt history from the DB but the just-inserted user row was still
uncommitted and invisible, so vLLM rejected the turn ("No user query found").
Fixed by passing the sanitized user message in memory to runTurn (parallel to
Anthropic's pushMessage), rather than relying on a re-read. historyBuilder
documents the invariant.
Review nits addressed
removing the empty-bearer fallback.
it today).
replayed.
vars are 0 (cost tracking / budget enforcement are no-ops otherwise).
Known limitations / before un-flag
Deferred deliberately, not blockers for landing behind the flag:
detection, timeout/abort). historyBuilder and config validation are covered.
and the user retries. This matches the Anthropic path at the Breeze level
(streamingSessionManager has no retry/reconnect loop either; an error ends
the turn). One caveat: the Claude SDK may retry inside its own subprocess,
whereas the raw fetch to vLLM does not, so the underlying resilience can
differ even though the Breeze-level behavior is the same.
needed for local vLLM). A production deploy would want https + an allowlist
or RFC-1918 blocking under NODE_ENV=production. Left for maintainer judgment.
chat-only path the model emits tool calls as plain text. Expected at this
scope; this is the boundary phase 2 (tool-calling) would address.
status='active' in DB; only the 24h max-age path writes status='expired'.
This is the same in both managers (StreamingSessionManager and the new one
share the structure and the 2h/24h constants), so it is existing behavior
rather than something specific to this path. Flagged for awareness.
branches on the flag with no edits to that path, and I verified the shared
patterns it relies on (runOutsideDbContextSafe, pushMessage,
processorPromise.catch) by reading the code, but I couldn't confirm the
Anthropic flow end-to-end at runtime.
pnpm run buildneedsNODE_OPTIONS=--max-old-space-sizefor the DTS stepon low-RAM machines. Pre-existing, unrelated.
Open questions
still holds the Claude default, so the UI would show a Claude id while the
turn actually runs on the local model. Two options: keep the silent override
(current, minimal diff) and add a UI surface later, or persist MCP_LLM_MODEL
into ai_sessions.model at session creation on this path so the stored value
reflects what actually ran. Happy to do the latter if you prefer it.
sequence tool-calling on this path: a function-calling adapter over the
current tool surface, or a provider-neutral surface?