feat(llm): chat-only OpenAI-compatible provider behind MCP_LLM_PROVIDER flag (PoC) by Emilien-Etadam · Pull Request #859 · LanternOps/breeze

Emilien-Etadam · 2026-05-23T13:42:06Z

Draft PoC for the alternative LLM backend from #505. Adds a feature-flagged,
chat-only, OpenAI-compatible provider path (target: local vLLM) alongside the
existing Anthropic path.

Scope as agreed: Anthropic stays the source of truth, the new path is
best-effort with no SLA, off by default, tool-calling out of scope here.

What it does

MCP_LLM_PROVIDER=anthropic (default) | openai-compatible. Absent or
anthropic leaves behavior unchanged.
New code isolated in apps/api/src/services/llm/:
- openaiCompatibleProvider.ts: native fetch + manual SSE, no openai
  dependency. Sends no tools; if the model returns tool_calls anyway,
  yields an error and stops (no silent fallback).
- openaiSessionManager.ts: no long-lived subprocess. Each turn is an
  independent HTTP stream; sessions persist only for eventBus + TTL eviction.
- historyBuilder.ts: rebuilds context from ai_messages (no SDK resume
  equivalent). Refuses sessions containing tool-use messages rather than
  silently dropping context.
Config: when the flag is openai-compatible, MCP_LLM_BASE_URL,
MCP_LLM_MODEL, MCP_LLM_API_KEY are required or the server refuses to
start.
Cost tracking: best-effort, tokens to USD via declared pricing. No
prompt-caching equivalent on vLLM (noted in code).

Untouched (intentional)

Tool-calling / MCP surface, Script Builder, Helper flow, aiGuardrails,
aiTools, risk tiers, streamingSessionManager, and the Anthropic flow.

Testing

Local vLLM serving Qwen3, Node 22, Postgres 17, non-Docker, with RLS enforced
(breeze_app role active).

Provider smoke test (streaming, usage, clean termination): pass.
Boot with flag + config: clean. Boot with flag, missing config: refused with
explicit Zod error.
End-to-end over the HTTP API: valid SSE stream from vLLM, multi-turn
(history correctly carried across turns).
End-to-end via web UI: a non-tool prompt streams a clean answer from the
local model in the existing chat UI.
Stop, and session delete, verified mid-stream.
Unit tests: historyBuilder (9), validate rules (4). Pass.

Changes since review

Thanks for the detailed reviews. All blockers are addressed; the path now runs
end-to-end against a local vLLM with RLS enforced, validated across multi-turn
conversations.

Blockers (maintainer)

Stop button now routes to the correct manager on the openai-compatible path
(was hardcoded to streamingSessionManager).
pause / approve-plan / abort-plan return 501 NOT_SUPPORTED_ON_PROVIDER on
this path instead of a misleading 404.
Removed the dead message_start yield in the provider; the session manager is
the single source of that event.
turnCount now increments in a finally for any turn that reached the provider
(success, 5xx, abort), not on upstream refusals like ToolUseInHistoryError.
Billing stays success-only. turnCount is the anti-loop guard; token billing
is a separate metric.

Robustness

User abort (Stop) ends the stream cleanly instead of surfacing as an internal
error; a genuine 6-minute timeout is now distinguished from a user Stop and
does surface as an error.
DELETE /sessions/:id evicts in-memory state from the openai session manager
(same routing fix as Stop).
Background runTurn now has a rejection handler (Sentry + log), mirroring the
Anthropic processorPromise.catch.

Additional fixes from end-to-end testing with RLS active

RLS write failure on ai_cost_usage. Post-stream writes inherited the
request transaction context and failed the WITH CHECK policy. Fixed by
starting the async turn outside the request ALS via runOutsideDbContextSafe,
mirroring the Anthropic path. (This was masked while developing without the
breeze_app role.)
Transactional isolation on the current user message. After the above fix,
runTurn rebuilt history from the DB but the just-inserted user row was still
uncommitted and invisible, so vLLM rejected the turn ("No user query found").
Fixed by passing the sanitized user message in memory to runTurn (parallel to
Anthropic's pushMessage), rather than relying on a re-read. historyBuilder
documents the invariant.

Review nits addressed

MCP_LLM_API_KEY now uses a non-null assertion (Zod already guarantees it),
removing the empty-bearer fallback.
Comment added noting multipart delta.content is dropped (vLLM doesn't emit
it today).
Smoke script moved out of .devscripts into a versioned path so it can be
replayed.
Startup warning when the provider is openai-compatible and both price env
vars are 0 (cost tracking / budget enforcement are no-ops otherwise).

Known limitations / before un-flag

Deferred deliberately, not blockers for landing behind the flag:

No unit tests yet for the provider/session manager (SSE parser, tool_call
detection, timeout/abort). historyBuilder and config validation are covered.
No mid-stream network retry: if vLLM restarts during a turn, that turn fails
and the user retries. This matches the Anthropic path at the Breeze level
(streamingSessionManager has no retry/reconnect loop either; an error ends
the turn). One caveat: the Claude SDK may retry inside its own subprocess,
whereas the raw fetch to vLLM does not, so the underlying resilience can
differ even though the Breeze-level behavior is the same.
MCP_LLM_BASE_URL accepts any URL (incl. private IPs / localhost, which is
needed for local vLLM). A production deploy would want https + an allowlist
or RFC-1918 blocking under NODE_ENV=production. Left for maintainer judgment.
Cost tracking is a no-op when prices are 0 (now warned at startup).
Tool-call leakage: the system prompt still advertises tools, so on this
chat-only path the model emits tool calls as plain text. Expected at this
scope; this is the boundary phase 2 (tool-calling) would address.
Session eviction by idle (2h) removes the in-memory session but leaves
status='active' in DB; only the 24h max-age path writes status='expired'.
This is the same in both managers (StreamingSessionManager and the new one
share the structure and the 2h/24h constants), so it is existing behavior
rather than something specific to this path. Flagged for awareness.
Anthropic path not exercised at runtime in my env (no credits). The handler
branches on the flag with no edits to that path, and I verified the shared
patterns it relies on (runOutsideDbContextSafe, pushMessage,
processorPromise.catch) by reading the code, but I couldn't confirm the
Anthropic flow end-to-end at runtime.
pnpm run build needs NODE_OPTIONS=--max-old-space-size for the DTS step
on low-RAM machines. Pre-existing, unrelated.

Open questions

On this path the provider always uses MCP_LLM_MODEL, but ai_sessions.model
still holds the Claude default, so the UI would show a Claude id while the
turn actually runs on the local model. Two options: keep the silent override
(current, minimal diff) and add a UI surface later, or persist MCP_LLM_MODEL
into ai_sessions.model at session creation on this path so the stored value
reflects what actually ran. Happy to do the latter if you prefer it.
Chat-only is a validation milestone. Assuming this lands, how do you want to
sequence tool-calling on this path: a function-calling adapter over the
current tool surface, or a provider-neutral surface?

Cree apps/api/src/services/llm/types.ts avec : - LLMProviderType discriminant (anthropic | openai-compatible) - ChatMessage, LLMStreamEvent, LLMProvider interface (signature seulement) - OpenAISession : session legere sans SDK Query pour le chemin OpenAI Aucune implementation, aucun fichier existant modifie. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45

Ajoute dans envSchema (validate.ts) : - MCP_LLM_PROVIDER : enum 'anthropic' | 'openai-compatible', defaut 'anthropic' - MCP_LLM_BASE_URL, MCP_LLM_API_KEY, MCP_LLM_MODEL : optionnels - MCP_LLM_PRICE_INPUT_PER_M_USD, MCP_LLM_PRICE_OUTPUT_PER_M_USD : defaut 0 - Validation croisee : MCP_LLM_BASE_URL requis si provider = openai-compatible Le flag est absent ou 'anthropic' par defaut : comportement Anthropic inchange. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45

historyBuilder.ts : - buildMessagesFromHistory() charge ai_messages et mappe en ChatMessage[] - ToolUseInHistoryError si tool_use/tool_result present (refus explicite, pas de filtrage silencieux) openaiCompatibleProvider.ts : - fetch natif Node, parsing SSE manuel, pas de dep openai - Aucun champ tools/tool_choice envoye a vLLM - Detection defensive tool_calls dans la reponse -> error event, arret immediat - computeCostUsd() pour le cost tracking best-effort - Note : vLLM n'a pas d'equivalent au prompt caching Anthropic Aucun fichier existant modifie. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45

openaiSessionManager.ts : - OpenAISessionManager : Map de sessions legeres (pas de SDK Query) - getOrCreate, tryTransitionToProcessing, startTurn, remove, eviction TTL - runTurn : charge historique, stream vLLM, publie events eventBus, sauve message assistant, enregistre cout, publie done - Constantes TTL copiees de streamingSessionManager.ts (2h idle, 24h max) aiCostTracker.ts : - Ajout recordOpenAIUsage() pour le cost tracking best-effort OpenAI - Note : pas d'equivalent prompt caching sur vLLM ai.ts (route) : - Imports lazy singleton OpenAISessionManager / OpenAICompatibleProvider - Early-return OpenAI-compatible avant le bloc Anthropic existant - Chemin Anthropic : identique, inchange https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45

openaiCompatibleProvider.ts : - B2 : reader.cancel() dans le finally pour liberer le socket promptement - B3 : timeout fetch 6 min via AbortSignal.any + clearTimeout a chaque sortie anticipee - M1 : split SSE sur \r?\n\r?\n pour robustesse proxy - M2 : detection finish_reason === 'tool_calls' en plus de delta.tool_calls openaiSessionManager.ts : - M3 : abort de l'ancien controller avant remplacement dans startTurn - M4 : commentaire explicitant pourquoi le finally Anthropic n'est pas copie ai.ts : - M5 : guard explicite sur MCP_LLM_BASE_URL au lieu du non-null assertion Note B1 : totalCostCents est de type real (pas integer) en DB, la formule Math.round(costUsd * 100 * 100) / 100 est identique a recordUsageFromSdkResult, aucune correction necessaire. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45

9 cas couverts : - historique vide -> tableau vide - messages user+assistant mappes correctement - messages system ignores silencieusement - assistant null (tour pure tool-use) ignore - user null ignore - tool_use dans historique -> ToolUseInHistoryError - tool_result dans historique -> ToolUseInHistoryError - message d'erreur contient le sessionId - ordre des messages preserve https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45

Repertoire local pour scripts de dev non versionnés (smoke tests, etc.). Mentionne dans la description du PR pour que le mainteneur tranche sur le versionning eventuel. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45

La colonne ai_sessions reste orientée Anthropic ; le chemin vLLM refuse les ids Claude. Exiger BASE_URL, MODEL et API_KEY au boot évite une dégradation silencieuse. Co-authored-by: Cursor <cursoragent@cursor.com>

ToddHebebrand · 2026-05-23T16:22:36Z

Thanks for the PoC, @Emilien-Etadam — scope is unusually clean for 1,000+ lines: pure-additive new directory, real test coverage on the non-trivial logic (historyBuilder + config validation), default Anthropic path genuinely untouched. The branch at apps/api/src/routes/ai.ts:340 triggers only when MCP_LLM_PROVIDER === 'openai-compatible' and the Zod default keeps absent-env behavior identical, so the regression risk you flagged is low.

A few things I'd like addressed before this moves out of draft.

Blocker (even as draft, before this lands behind the flag):

The Stop button silently fails on the openai-compatible path. apps/api/src/routes/ai.ts:493-530 hard-codes streamingSessionManager.interrupt(sessionId). When the flag is on, the in-flight HTTP turn lives on OpenAISessionManager and is never aborted — the route returns 409 "Session not found" and the user's "Stop" click no-ops. OpenAISessionManager.interrupt() already exists at apps/api/src/services/llm/openaiSessionManager.ts:222-231 and abort-controllers correctly. The fix is a one-line provider switch:

const manager = getConfig().MCP_LLM_PROVIDER === 'openai-compatible'
  ? getOpenAISessionManager()
  : streamingSessionManager;
result = await manager.interrupt(sessionId);

Same shape for /sessions/:id/pause (line 592), /sessions/:id/approve-plan (line 639), /sessions/:id/abort-plan (line 702) — those tool/plan flows aren't supported on the openai path at all, so at minimum return a clear 501 NOT_SUPPORTED_ON_PROVIDER instead of the misleading "Session not active in memory".

Worth fixing in this PR:

LLMStreamEvent.message_start shape disagrees with AiStreamEvent. apps/api/src/services/llm/types.ts:35-38 declares { type: 'message_start' } (no messageId) while the shared AiStreamEvent requires { type, messageId }. The session manager publishes its own message_start with messageId at openaiSessionManager.ts:189 and silently swallows the provider's bare event at line 217 (case 'message_start': break;). It works today, but the provider's own message_start yield at openaiCompatibleProvider.ts:666 is dead code. Either delete that yield or align LLMStreamEvent to require messageId.

Failed turns leave the per-session turnCount un-incremented (openaiSessionManager.ts:223). The if (!hadError && assistantText) guard means tool-call rejections, HTTP 5xx, aborts — all skip both billing and turnCount. For local vLLM (the PoC target) harmless; for a paid endpoint, this allows free retries of malformed prompts and maxTurns can't backstop a stuck loop. Move turnCount increment to a finally, gate only the DB row insert on success.

Before un-flagging in production (not blockers for landing the draft):

No tests for the provider or session manager. The SSE parser, tool_call detection, and timeout/abort plumbing are the riskiest pieces and are entirely untested. HistoryBuilder + validate.ts coverage is well-written, that's not the gap.
MCP_LLM_BASE_URL SSRF surface. z.string().url() accepts any URL including private IPs and 127.0.0.1. Operator-supplied, so risk is bounded, but a production deploy should require https + an allowlist (or at minimum block RFC-1918 + link-local when NODE_ENV=production).
Mid-stream budget enforcement. Anthropic path uses the SDK to abort when maxBudgetUsd is exceeded mid-turn; openai path reads maxBudgetUsd at routes/ai.ts:337 then drops it. An SSE client disconnect mid-stream (no .remove() call) leaves the fetch streaming until idle eviction (2h), burning budget the whole time.
Smoke script placement — answering your open question: drop it under apps/api/src/services/llm/__scripts__/openai-smoke.ts (or similar) so future reviewers can replay it. Don't gitignore it.
dbSession.model is read at routes/ai.ts:378 then ignored in favor of MCP_LLM_MODEL. Probably worth a UI surface ("Session created with X is running on Y") before this is user-visible.

CLAUDE.md compliance: clean. withDbAccessContext is correctly used for the new queries, files are well under the 500-line guideline, no new tables / no RLS implications.

Once the Stop-button routing is fixed I'm comfortable with this landing as a flag-gated experimental path. Happy to keep iterating with you on the un-flag prerequisites.

Co-authored-by: Cursor <cursoragent@cursor.com>

…tarts Co-authored-by: Cursor <cursoragent@cursor.com>

bdunncompany · 2026-05-23T18:50:08Z

@Emilien-Etadam — read through this carefully and it's a well-built PoC. Strong defensive design: tool-call rejection covers both delta.tool_calls and finish_reason === 'tool_calls', SSE parser tolerates \r\n\r\n AND \n\n separators, AbortSignal.any combines caller + timeout (Node 22 OK per .github/workflows/ci.yml NODE_VERSION='22'), and the ToolUseInHistoryError design refuses to silently lose context. Honest limitations section in the PR body. I'm not a maintainer, just leaving a fellow-contributor review for whatever it's worth:

Things that read as correct:

Provider isolation in apps/api/src/routes/ai.ts branches cleanly on MCP_LLM_PROVIDER === 'openai-compatible'; the Anthropic path is untouched.
Zod cross-field validation correctly demands BASE_URL + MODEL + API_KEY together when the flag flips, refuses to boot otherwise.
Stream cancellation in finally block + reader.releaseLock() releases the socket on early break.
6-min timeout matches the Anthropic path.
Lazy singleton for OpenAISessionManager avoids paying construction cost on the default-Anthropic path.

Worth flagging (small):

computeCostUsd returns 0 when prices default to 0, which is the default per the Zod schema. That makes budget enforcement on the cost-tracker path silently a no-op on this provider unless the operator explicitly sets the price env vars. Probably worth a startup-time log warning when MCP_LLM_PROVIDER === 'openai-compatible' AND both prices are 0, so the operator doesn't get surprised by an unbounded burn.
apiKey: cfg.MCP_LLM_API_KEY ?? '' in the route adapter. The Zod cross-field validation already requires the API key when the flag is set, so the ?? '' fallback is dead code AND mildly dangerous — if validation is somehow bypassed, an empty bearer (Authorization: Bearer ) goes out. Safer to use a non-null assertion (cfg.MCP_LLM_API_KEY!) so type-system enforcement matches runtime enforcement.
typeof choice.delta?.content === 'string' — some OpenAI-compat backends return content as an array (multipart). vLLM doesn't today, but worth a one-line comment saying // multipart content is currently silently dropped — vLLM doesn't emit it; revisit if a future backend does.
No retry on transient network error. If the vLLM container restarts mid-stream the session breaks with no retry. Acceptable for PoC; worth noting in the limitations section if it's not already.

Re: your open questions:

.devscripts/openai-smoke.ts location: I'd commit it under scripts/dev/ or similar — gitignored dev helpers tend to bit-rot. Reviewers want to be able to repro your smoke test without recreating the file from the PR description.
Tool-calling sequencing: the provider-neutral surface seems right long-term, but a function-calling adapter over the current tool surface gets you to a working tool path faster. If you want to keep the Anthropic flow as the source of truth, the adapter is closer to "best effort no SLA" than a parallel tool engine would be.

One unverified thing I can't speak to:

I haven't tested the Anthropic path either — same constraint as you, no credits handy. The diff doesn't touch the Anthropic branch, so absent runtime evidence the static analysis says it's unaffected.

Thanks for the PoC — solid foundation if it goes in.

Stop no longer yields an error event or surfaces as internal error; non-abort stream failures still emit a typed error.

Classify combined abort by timeout vs caller signal so 6m stalls error while Stop still ends the stream without a provider error event.

Rely on Zod-validated config instead of falling back to an empty bearer token. Co-authored-by: Cursor <cursoragent@cursor.com>

Document that array-shaped delta.content from some backends is ignored today. Co-authored-by: Cursor <cursoragent@cursor.com>

Move the smoke test out of ignored .devscripts into the api package sources for repeatable runs in review. Co-authored-by: Cursor <cursoragent@cursor.com>

Nested withDbAccessContext was skipping set_config while the auth middleware’s ALS was still active, breaking RLS on ai_cost_usage. Mirror StreamingSessionManager. Co-authored-by: Cursor <cursoragent@cursor.com>

Close now evicts in-memory state from OpenAISessionManager when using the OpenAI-compatible provider, matching the Anthropic path and interrupt routing. Co-authored-by: Cursor <cursoragent@cursor.com>

Attach a rejection handler mirroring StreamingSessionManager’s processorPromise .catch so stray failures surface in Sentry and logs. Co-authored-by: Cursor <cursoragent@cursor.com>

runTurn reads committed history after runOutsideDbContext; pending user rows were invisible — pass sanitized content separately like Anthropic pushMessage. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Emilien-Etadam · 2026-05-24T07:46:29Z

Pushed the fixes — all four blockers plus the Stop/timeout cleanup are done,
and testing with the breeze_app role active surfaced two RLS/isolation issues
that are now fixed too. Full breakdown and the open questions are in the
updated description above.

@bdunncompany thanks for the careful pass — your nits are all addressed
(API key assertion, multipart comment, smoke script versioned, zero-price
startup warning).

No rush on review.

bdunncompany · 2026-05-24T10:11:20Z

@Emilien-Etadam thanks for the careful follow-up. Walked the 14 new commits, they map cleanly to the asks:

Blockers — all 4 addressed:

d267c7d8 Stop → correct manager
24198c87 plan/pause/abort → 501 NOT_SUPPORTED_ON_PROVIDER
a9f534c1 dead message_start yield removed
dd39576d turnCount finally-block

Robustness — Stop vs timeout distinction (8c1fd79b + d3db1107), DELETE session routing (c0a4273c), and background runTurn rejection handler (b695423a) all look right.

The two new RLS fixes are the most valuable adds from this round, and the fact that they only surfaced once you tested with breeze_app active is exactly the kind of pre-merge catch that a flagged scope is for:

5d1e643b — moving the async turn outside the request ALS so post-stream writes to ai_cost_usage don't inherit the request transaction context. Mirrors the Anthropic path's runOutsideDbContextSafe pattern.
4be5b465 — passing the sanitized user message in-memory so the just-inserted-still-uncommitted user row isn't required for history rebuild. The invariant comment in historyBuilder is the right shape.

Nits — all four landed: API key non-null assertion, multipart drop comment, smoke script versioned path, zero-price startup warning.

LGTM from a fellow contributor. Leaving the merge decision to @ToddHebebrand; flagging that the "known limitations" list in the description is honest about what's deferred (provider unit tests, mid-stream retry, BASE_URL allowlist, idle eviction DB drift) so reviewers can size the follow-up issues separately from the merge gate.

No rush on my end either — this is solid groundwork for #505 phase 1.

Commits the approved 2026-06-07 brainstorm spec for per-process resource drill-down on the device Performance screen, and the living e2e coverage index that the e2e-coverage skill reads/appends per sweep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resolves the validate.ts / validate.test.ts overlap: the MCP_LLM_* config block independently landed on main via LanternOps#864, so validate.ts takes main's version wholesale; validate.test.ts keeps both this PR's MCP_LLM config tests and main's newer test blocks.

getConfig() throws before validateConfig() runs, and route unit tests never validate config — so every guard added for the openai-compatible path threw 500s in ai route tests (latent since the PR ran as a draft with no CI). isOpenAICompatibleProvider() treats an unvalidated config as the default anthropic path; production validates at boot, so behavior there is unchanged.

ToddHebebrand · 2026-06-11T22:58:22Z

@Emilien-Etadam — first, an apology. You addressed every blocker on May 24, bdunncompany confirmed the fixes commit-by-commit the same day, and then this sat for 18 days with the ball squarely in my court. That's on me: my review queue kept bucketing it as an author-blocked draft, and the "merge decision to Todd" handoff in the comments never overrode the draft flag. I've fixed that triage rule so a thread handoff outranks draft status.

Final review: nothing new owed on the code. A fresh pass against today's main (which has moved a lot since your base — AI tool site-scope enforcement, PAM tool-action flows, M365 agent) came back clean: those features gate tool execution, which this path doesn't do by design, and recordOpenAIUsage still matches the current cost/billing chokepoints exactly.

Merge prep I pushed to your branch (hope you don't mind — wanted to spare you another round after the wait): marked ready-for-review and pushed two commits. 8311931 merges main in — the one real conflict was self-inflicted on our side: the MCP_LLM_* config block landed on main independently via #864 (f891253e), so apps/api/src/config/validate.ts takes main's version wholesale and validate.test.ts keeps both your config tests and main's newer blocks. 22784d1 fixes one latent bug the first-ever CI run on this PR surfaced (drafts here never ran CI, so nobody could have seen it): the provider guards call getConfig() inside route handlers, which throws before validateConfig() runs — route unit tests never validate config, so the pause/approve/reject guards 500'd eleven ai-route tests. The fix is a tolerant isOpenAICompatibleProvider() that treats an unvalidated config as the default anthropic path; production validates at boot, so no behavior change there. 173 tests green locally plus a clean typecheck on the merged tree.

Follow-up scope stays as the honest known-limitations list in your description (provider unit tests, mid-stream retry, BASE_URL allowlist, idle-eviction DB drift), plus one my re-review added: single-turn budget overshoot — usage is recorded after the turn and checkBudget only gates the next turn's preflight, so one openai-compatible turn can exceed the remaining budget. All acceptable for an off-by-default flagged path; we'll size them as issues when this graduates from PoC.

Merging now that CI is green. The two RLS catches you found by testing with the breeze_app role active were exactly the kind of pre-merge find this review process exists for — genuinely good work, and a model follow-up round. Sorry again for the silence.

Status: merged (admin-squash) once CI completes; follow-up issues to be filed at graduation time.

Emilien-Etadam · 2026-06-12T03:46:45Z

No apology needed, thanks for the merge prep and the CI fix. Happy to pick up the follow-ups at graduation; glad to discuss the phase-2 tool-calling sequencing sooner whenever suits you. Le 12 juin 2026 00:58:44 GMT+02:00, Todd Hebebrand ***@***.***> a écrit :

…

ToddHebebrand left a comment (LanternOps/breeze#859) @Emilien-Etadam — first, an apology. You addressed every blocker on May 24, bdunncompany confirmed the fixes commit-by-commit the same day, and then this sat for 18 days with the ball squarely in my court. That's on me: my review queue kept bucketing it as an author-blocked draft, and the "merge decision to Todd" handoff in the comments never overrode the draft flag. I've fixed that triage rule so a thread handoff outranks draft status. **Final review:** nothing new owed on the code. A fresh pass against today's `main` (which has moved a lot since your base — AI tool site-scope enforcement, PAM tool-action flows, M365 agent) came back clean: those features gate *tool execution*, which this path doesn't do by design, and `recordOpenAIUsage` still matches the current cost/billing chokepoints exactly. **Merge prep I pushed to your branch** (hope you don't mind — wanted to spare you another round after the wait): marked ready-for-review and pushed two commits. `8311931` merges `main` in — the one real conflict was self-inflicted on our side: the `MCP_LLM_*` config block landed on `main` independently via #864 (`f891253e`), so `apps/api/src/config/validate.ts` takes main's version wholesale and `validate.test.ts` keeps both your config tests and main's newer blocks. `22784d1` fixes one latent bug the first-ever CI run on this PR surfaced (drafts here never ran CI, so nobody could have seen it): the provider guards call `getConfig()` inside route handlers, which throws before `validateConfig()` runs — route unit tests never validate config, so the pause/approve/reject guards 500'd eleven ai-route tests. The fix is a tolerant `isOpenAICompatibleProvider()` that treats an unvalidated config as the default anthropic path; production validates at boot, so no behavior change there. 173 tests green locally plus a clean typecheck on the merged tree. **Follow-up scope** stays as the honest known-limitations list in your description (provider unit tests, mid-stream retry, BASE_URL allowlist, idle-eviction DB drift), plus one my re-review added: single-turn budget overshoot — usage is recorded after the turn and `checkBudget` only gates the next turn's preflight, so one openai-compatible turn can exceed the remaining budget. All acceptable for an off-by-default flagged path; we'll size them as issues when this graduates from PoC. Merging now that CI is green. The two RLS catches you found by testing with the `breeze_app` role active were exactly the kind of pre-merge find this review process exists for — genuinely good work, and a model follow-up round. Sorry again for the silence. **Status:** merged (admin-squash) once CI completes; follow-up issues to be filed at graduation time. -- Reply to this email directly or view it on GitHub: #859 (comment) You are receiving this because you were mentioned. Message ID: ***@***.***>

claude and others added 8 commits May 18, 2026 07:54

chore: ajoute .devscripts/ au gitignore

821431b

Repertoire local pour scripts de dev non versionnés (smoke tests, etc.). Mentionne dans la description du PR pour que le mainteneur tranche sur le versionning eventuel. https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45

fix(llm): router les tours openai-compatible vers MCP_LLM_MODEL

ad3b41a

La colonne ai_sessions reste orientée Anthropic ; le chemin vLLM refuse les ids Claude. Exiger BASE_URL, MODEL et API_KEY au boot évite une dégradation silencieuse. Co-authored-by: Cursor <cursoragent@cursor.com>

Your Name and others added 4 commits May 23, 2026 16:37

fix(llm): route interrupt/stop to openai-compatible session manager

d267c7d

Co-authored-by: Cursor <cursoragent@cursor.com>

fix(llm): return 501 for plan/pause flows on openai-compatible

24198c8

Co-authored-by: Cursor <cursoragent@cursor.com>

fix(llm): drop redundant provider message_start yield

a9f534c

Co-authored-by: Cursor <cursoragent@cursor.com>

fix(llm): always increment OpenAI-compat turnCount after chatStream s…

dd39576

…tarts Co-authored-by: Cursor <cursoragent@cursor.com>

Your Name and others added 10 commits May 23, 2026 20:27

fix(llm): treat openai-compat user abort as clean stream end

8c1fd79

Stop no longer yields an error event or surfaces as internal error; non-abort stream failures still emit a typed error.

fix(llm): surface openai-compat timeout distinct from user Stop

d3db110

Classify combined abort by timeout vs caller signal so 6m stalls error while Stop still ends the stream without a provider error event.

fix(llm): assert MCP_LLM_API_KEY presence instead of empty fallback

d95ba17

Rely on Zod-validated config instead of falling back to an empty bearer token. Co-authored-by: Cursor <cursoragent@cursor.com>

chore(llm): note multipart content is dropped on openai-compat path

765ce64

Document that array-shaped delta.content from some backends is ignored today. Co-authored-by: Cursor <cursoragent@cursor.com>

chore(llm): commit openai-compat smoke script under versioned path

568f929

Move the smoke test out of ignored .devscripts into the api package sources for repeatable runs in review. Co-authored-by: Cursor <cursoragent@cursor.com>

fix(llm): run openai-compatible turns outside HTTP request ALS.

5d1e643

Nested withDbAccessContext was skipping set_config while the auth middleware’s ALS was still active, breaking RLS on ai_cost_usage. Mirror StreamingSessionManager. Co-authored-by: Cursor <cursoragent@cursor.com>

fix(llm): route session delete to openai-compatible manager

c0a4273

Close now evicts in-memory state from OpenAISessionManager when using the OpenAI-compatible provider, matching the Anthropic path and interrupt routing. Co-authored-by: Cursor <cursoragent@cursor.com>

fix(llm): log unhandled errors from background runTurn

b695423

Attach a rejection handler mirroring StreamingSessionManager’s processorPromise .catch so stray failures surface in Sentry and logs. Co-authored-by: Cursor <cursoragent@cursor.com>

fix(llm): append current OpenAI-compat user message in-memory for vLLM

4be5b46

runTurn reads committed history after runOutsideDbContext; pending user rows were invisible — pass sanitized content separately like Anthropic pushMessage. Co-authored-by: Cursor <cursoragent@cursor.com>

chore(llm): warn once when openai-compatible has zero MCP prices

c4d0735

Co-authored-by: Cursor <cursoragent@cursor.com>

ToddHebebrand and others added 2 commits June 11, 2026 16:07

chore: gitignore .impeccable/ plugin artifacts

19bcf50

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ToddHebebrand marked this pull request as ready for review June 11, 2026 22:28

ToddHebebrand merged commit 814f422 into LanternOps:main Jun 11, 2026
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): chat-only OpenAI-compatible provider behind MCP_LLM_PROVIDER flag (PoC)#859

feat(llm): chat-only OpenAI-compatible provider behind MCP_LLM_PROVIDER flag (PoC)#859
ToddHebebrand merged 26 commits into
LanternOps:mainfrom
Emilien-Etadam:claude/poc-alternative-llm-YrlE8

Emilien-Etadam commented May 23, 2026 •

edited

Loading

Uh oh!

ToddHebebrand commented May 23, 2026

Uh oh!

bdunncompany commented May 23, 2026

Uh oh!

Emilien-Etadam commented May 24, 2026

Uh oh!

bdunncompany commented May 24, 2026

Uh oh!

Uh oh!

ToddHebebrand commented Jun 11, 2026

Uh oh!

Emilien-Etadam commented Jun 12, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Emilien-Etadam commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it does

Untouched (intentional)

Testing

Changes since review

Blockers (maintainer)

Robustness

Additional fixes from end-to-end testing with RLS active

Review nits addressed

Known limitations / before un-flag

Open questions

Uh oh!

ToddHebebrand commented May 23, 2026

Uh oh!

bdunncompany commented May 23, 2026

Uh oh!

Emilien-Etadam commented May 24, 2026

Uh oh!

bdunncompany commented May 24, 2026

Uh oh!

Uh oh!

ToddHebebrand commented Jun 11, 2026

Uh oh!

Emilien-Etadam commented Jun 12, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Emilien-Etadam commented May 23, 2026 •

edited

Loading