Skip to content

feat(llm): chat-only OpenAI-compatible provider behind MCP_LLM_PROVIDER flag (PoC)#859

Merged
ToddHebebrand merged 26 commits into
LanternOps:mainfrom
Emilien-Etadam:claude/poc-alternative-llm-YrlE8
Jun 11, 2026
Merged

feat(llm): chat-only OpenAI-compatible provider behind MCP_LLM_PROVIDER flag (PoC)#859
ToddHebebrand merged 26 commits into
LanternOps:mainfrom
Emilien-Etadam:claude/poc-alternative-llm-YrlE8

Conversation

@Emilien-Etadam

@Emilien-Etadam Emilien-Etadam commented May 23, 2026

Copy link
Copy Markdown
Contributor

Draft PoC for the alternative LLM backend from #505. Adds a feature-flagged,
chat-only, OpenAI-compatible provider path (target: local vLLM) alongside the
existing Anthropic path.

Scope as agreed: Anthropic stays the source of truth, the new path is
best-effort with no SLA, off by default, tool-calling out of scope here.

What it does

  • MCP_LLM_PROVIDER=anthropic (default) | openai-compatible. Absent or
    anthropic leaves behavior unchanged.
  • New code isolated in apps/api/src/services/llm/:
    • openaiCompatibleProvider.ts: native fetch + manual SSE, no openai
      dependency. Sends no tools; if the model returns tool_calls anyway,
      yields an error and stops (no silent fallback).
    • openaiSessionManager.ts: no long-lived subprocess. Each turn is an
      independent HTTP stream; sessions persist only for eventBus + TTL eviction.
    • historyBuilder.ts: rebuilds context from ai_messages (no SDK resume
      equivalent). Refuses sessions containing tool-use messages rather than
      silently dropping context.
  • Config: when the flag is openai-compatible, MCP_LLM_BASE_URL,
    MCP_LLM_MODEL, MCP_LLM_API_KEY are required or the server refuses to
    start.
  • Cost tracking: best-effort, tokens to USD via declared pricing. No
    prompt-caching equivalent on vLLM (noted in code).

Untouched (intentional)

Tool-calling / MCP surface, Script Builder, Helper flow, aiGuardrails,
aiTools, risk tiers, streamingSessionManager, and the Anthropic flow.

Testing

Local vLLM serving Qwen3, Node 22, Postgres 17, non-Docker, with RLS enforced
(breeze_app role active).

  • Provider smoke test (streaming, usage, clean termination): pass.
  • Boot with flag + config: clean. Boot with flag, missing config: refused with
    explicit Zod error.
  • End-to-end over the HTTP API: valid SSE stream from vLLM, multi-turn
    (history correctly carried across turns).
  • End-to-end via web UI: a non-tool prompt streams a clean answer from the
    local model in the existing chat UI.
  • Stop, and session delete, verified mid-stream.
  • Unit tests: historyBuilder (9), validate rules (4). Pass.

Changes since review

Thanks for the detailed reviews. All blockers are addressed; the path now runs
end-to-end against a local vLLM with RLS enforced, validated across multi-turn
conversations.

Blockers (maintainer)

  • Stop button now routes to the correct manager on the openai-compatible path
    (was hardcoded to streamingSessionManager).
  • pause / approve-plan / abort-plan return 501 NOT_SUPPORTED_ON_PROVIDER on
    this path instead of a misleading 404.
  • Removed the dead message_start yield in the provider; the session manager is
    the single source of that event.
  • turnCount now increments in a finally for any turn that reached the provider
    (success, 5xx, abort), not on upstream refusals like ToolUseInHistoryError.
    Billing stays success-only. turnCount is the anti-loop guard; token billing
    is a separate metric.

Robustness

  • User abort (Stop) ends the stream cleanly instead of surfacing as an internal
    error; a genuine 6-minute timeout is now distinguished from a user Stop and
    does surface as an error.
  • DELETE /sessions/:id evicts in-memory state from the openai session manager
    (same routing fix as Stop).
  • Background runTurn now has a rejection handler (Sentry + log), mirroring the
    Anthropic processorPromise.catch.

Additional fixes from end-to-end testing with RLS active

  • RLS write failure on ai_cost_usage. Post-stream writes inherited the
    request transaction context and failed the WITH CHECK policy. Fixed by
    starting the async turn outside the request ALS via runOutsideDbContextSafe,
    mirroring the Anthropic path. (This was masked while developing without the
    breeze_app role.)
  • Transactional isolation on the current user message. After the above fix,
    runTurn rebuilt history from the DB but the just-inserted user row was still
    uncommitted and invisible, so vLLM rejected the turn ("No user query found").
    Fixed by passing the sanitized user message in memory to runTurn (parallel to
    Anthropic's pushMessage), rather than relying on a re-read. historyBuilder
    documents the invariant.

Review nits addressed

  • MCP_LLM_API_KEY now uses a non-null assertion (Zod already guarantees it),
    removing the empty-bearer fallback.
  • Comment added noting multipart delta.content is dropped (vLLM doesn't emit
    it today).
  • Smoke script moved out of .devscripts into a versioned path so it can be
    replayed.
  • Startup warning when the provider is openai-compatible and both price env
    vars are 0 (cost tracking / budget enforcement are no-ops otherwise).

Known limitations / before un-flag

Deferred deliberately, not blockers for landing behind the flag:

  • No unit tests yet for the provider/session manager (SSE parser, tool_call
    detection, timeout/abort). historyBuilder and config validation are covered.
  • No mid-stream network retry: if vLLM restarts during a turn, that turn fails
    and the user retries. This matches the Anthropic path at the Breeze level
    (streamingSessionManager has no retry/reconnect loop either; an error ends
    the turn). One caveat: the Claude SDK may retry inside its own subprocess,
    whereas the raw fetch to vLLM does not, so the underlying resilience can
    differ even though the Breeze-level behavior is the same.
  • MCP_LLM_BASE_URL accepts any URL (incl. private IPs / localhost, which is
    needed for local vLLM). A production deploy would want https + an allowlist
    or RFC-1918 blocking under NODE_ENV=production. Left for maintainer judgment.
  • Cost tracking is a no-op when prices are 0 (now warned at startup).
  • Tool-call leakage: the system prompt still advertises tools, so on this
    chat-only path the model emits tool calls as plain text. Expected at this
    scope; this is the boundary phase 2 (tool-calling) would address.
  • Session eviction by idle (2h) removes the in-memory session but leaves
    status='active' in DB; only the 24h max-age path writes status='expired'.
    This is the same in both managers (StreamingSessionManager and the new one
    share the structure and the 2h/24h constants), so it is existing behavior
    rather than something specific to this path. Flagged for awareness.
  • Anthropic path not exercised at runtime in my env (no credits). The handler
    branches on the flag with no edits to that path, and I verified the shared
    patterns it relies on (runOutsideDbContextSafe, pushMessage,
    processorPromise.catch) by reading the code, but I couldn't confirm the
    Anthropic flow end-to-end at runtime.
  • pnpm run build needs NODE_OPTIONS=--max-old-space-size for the DTS step
    on low-RAM machines. Pre-existing, unrelated.

Open questions

  • On this path the provider always uses MCP_LLM_MODEL, but ai_sessions.model
    still holds the Claude default, so the UI would show a Claude id while the
    turn actually runs on the local model. Two options: keep the silent override
    (current, minimal diff) and add a UI surface later, or persist MCP_LLM_MODEL
    into ai_sessions.model at session creation on this path so the stored value
    reflects what actually ran. Happy to do the latter if you prefer it.
  • Chat-only is a validation milestone. Assuming this lands, how do you want to
    sequence tool-calling on this path: a function-calling adapter over the
    current tool surface, or a provider-neutral surface?

claude and others added 8 commits May 18, 2026 07:54
Cree apps/api/src/services/llm/types.ts avec :
- LLMProviderType discriminant (anthropic | openai-compatible)
- ChatMessage, LLMStreamEvent, LLMProvider interface (signature seulement)
- OpenAISession : session legere sans SDK Query pour le chemin OpenAI

Aucune implementation, aucun fichier existant modifie.

https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
Ajoute dans envSchema (validate.ts) :
- MCP_LLM_PROVIDER : enum 'anthropic' | 'openai-compatible', defaut 'anthropic'
- MCP_LLM_BASE_URL, MCP_LLM_API_KEY, MCP_LLM_MODEL : optionnels
- MCP_LLM_PRICE_INPUT_PER_M_USD, MCP_LLM_PRICE_OUTPUT_PER_M_USD : defaut 0
- Validation croisee : MCP_LLM_BASE_URL requis si provider = openai-compatible

Le flag est absent ou 'anthropic' par defaut : comportement Anthropic inchange.

https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
historyBuilder.ts :
- buildMessagesFromHistory() charge ai_messages et mappe en ChatMessage[]
- ToolUseInHistoryError si tool_use/tool_result present (refus explicite,
  pas de filtrage silencieux)

openaiCompatibleProvider.ts :
- fetch natif Node, parsing SSE manuel, pas de dep openai
- Aucun champ tools/tool_choice envoye a vLLM
- Detection defensive tool_calls dans la reponse -> error event, arret immediat
- computeCostUsd() pour le cost tracking best-effort
- Note : vLLM n'a pas d'equivalent au prompt caching Anthropic

Aucun fichier existant modifie.

https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
openaiSessionManager.ts :
- OpenAISessionManager : Map de sessions legeres (pas de SDK Query)
- getOrCreate, tryTransitionToProcessing, startTurn, remove, eviction TTL
- runTurn : charge historique, stream vLLM, publie events eventBus, sauve
  message assistant, enregistre cout, publie done
- Constantes TTL copiees de streamingSessionManager.ts (2h idle, 24h max)

aiCostTracker.ts :
- Ajout recordOpenAIUsage() pour le cost tracking best-effort OpenAI
- Note : pas d'equivalent prompt caching sur vLLM

ai.ts (route) :
- Imports lazy singleton OpenAISessionManager / OpenAICompatibleProvider
- Early-return OpenAI-compatible avant le bloc Anthropic existant
- Chemin Anthropic : identique, inchange

https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
openaiCompatibleProvider.ts :
- B2 : reader.cancel() dans le finally pour liberer le socket promptement
- B3 : timeout fetch 6 min via AbortSignal.any + clearTimeout a chaque
  sortie anticipee
- M1 : split SSE sur \r?\n\r?\n pour robustesse proxy
- M2 : detection finish_reason === 'tool_calls' en plus de delta.tool_calls

openaiSessionManager.ts :
- M3 : abort de l'ancien controller avant remplacement dans startTurn
- M4 : commentaire explicitant pourquoi le finally Anthropic n'est pas copie

ai.ts :
- M5 : guard explicite sur MCP_LLM_BASE_URL au lieu du non-null assertion

Note B1 : totalCostCents est de type real (pas integer) en DB, la formule
Math.round(costUsd * 100 * 100) / 100 est identique a recordUsageFromSdkResult,
aucune correction necessaire.

https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
9 cas couverts :
- historique vide -> tableau vide
- messages user+assistant mappes correctement
- messages system ignores silencieusement
- assistant null (tour pure tool-use) ignore
- user null ignore
- tool_use dans historique -> ToolUseInHistoryError
- tool_result dans historique -> ToolUseInHistoryError
- message d'erreur contient le sessionId
- ordre des messages preserve

https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
Repertoire local pour scripts de dev non versionnés (smoke tests, etc.).
Mentionne dans la description du PR pour que le mainteneur tranche sur
le versionning eventuel.

https://claude.ai/code/session_01NTTxhJSrW37HRxwX4DYi45
La colonne ai_sessions reste orientée Anthropic ; le chemin vLLM refuse les ids Claude.
Exiger BASE_URL, MODEL et API_KEY au boot évite une dégradation silencieuse.

Co-authored-by: Cursor <cursoragent@cursor.com>
@ToddHebebrand

Copy link
Copy Markdown
Collaborator

Thanks for the PoC, @Emilien-Etadam — scope is unusually clean for 1,000+ lines: pure-additive new directory, real test coverage on the non-trivial logic (historyBuilder + config validation), default Anthropic path genuinely untouched. The branch at apps/api/src/routes/ai.ts:340 triggers only when MCP_LLM_PROVIDER === 'openai-compatible' and the Zod default keeps absent-env behavior identical, so the regression risk you flagged is low.

A few things I'd like addressed before this moves out of draft.

Blocker (even as draft, before this lands behind the flag):

The Stop button silently fails on the openai-compatible path. apps/api/src/routes/ai.ts:493-530 hard-codes streamingSessionManager.interrupt(sessionId). When the flag is on, the in-flight HTTP turn lives on OpenAISessionManager and is never aborted — the route returns 409 "Session not found" and the user's "Stop" click no-ops. OpenAISessionManager.interrupt() already exists at apps/api/src/services/llm/openaiSessionManager.ts:222-231 and abort-controllers correctly. The fix is a one-line provider switch:

const manager = getConfig().MCP_LLM_PROVIDER === 'openai-compatible'
  ? getOpenAISessionManager()
  : streamingSessionManager;
result = await manager.interrupt(sessionId);

Same shape for /sessions/:id/pause (line 592), /sessions/:id/approve-plan (line 639), /sessions/:id/abort-plan (line 702) — those tool/plan flows aren't supported on the openai path at all, so at minimum return a clear 501 NOT_SUPPORTED_ON_PROVIDER instead of the misleading "Session not active in memory".

Worth fixing in this PR:

LLMStreamEvent.message_start shape disagrees with AiStreamEvent. apps/api/src/services/llm/types.ts:35-38 declares { type: 'message_start' } (no messageId) while the shared AiStreamEvent requires { type, messageId }. The session manager publishes its own message_start with messageId at openaiSessionManager.ts:189 and silently swallows the provider's bare event at line 217 (case 'message_start': break;). It works today, but the provider's own message_start yield at openaiCompatibleProvider.ts:666 is dead code. Either delete that yield or align LLMStreamEvent to require messageId.

Failed turns leave the per-session turnCount un-incremented (openaiSessionManager.ts:223). The if (!hadError && assistantText) guard means tool-call rejections, HTTP 5xx, aborts — all skip both billing and turnCount. For local vLLM (the PoC target) harmless; for a paid endpoint, this allows free retries of malformed prompts and maxTurns can't backstop a stuck loop. Move turnCount increment to a finally, gate only the DB row insert on success.

Before un-flagging in production (not blockers for landing the draft):

  • No tests for the provider or session manager. The SSE parser, tool_call detection, and timeout/abort plumbing are the riskiest pieces and are entirely untested. HistoryBuilder + validate.ts coverage is well-written, that's not the gap.
  • MCP_LLM_BASE_URL SSRF surface. z.string().url() accepts any URL including private IPs and 127.0.0.1. Operator-supplied, so risk is bounded, but a production deploy should require https + an allowlist (or at minimum block RFC-1918 + link-local when NODE_ENV=production).
  • Mid-stream budget enforcement. Anthropic path uses the SDK to abort when maxBudgetUsd is exceeded mid-turn; openai path reads maxBudgetUsd at routes/ai.ts:337 then drops it. An SSE client disconnect mid-stream (no .remove() call) leaves the fetch streaming until idle eviction (2h), burning budget the whole time.
  • Smoke script placement — answering your open question: drop it under apps/api/src/services/llm/__scripts__/openai-smoke.ts (or similar) so future reviewers can replay it. Don't gitignore it.
  • dbSession.model is read at routes/ai.ts:378 then ignored in favor of MCP_LLM_MODEL. Probably worth a UI surface ("Session created with X is running on Y") before this is user-visible.

CLAUDE.md compliance: clean. withDbAccessContext is correctly used for the new queries, files are well under the 500-line guideline, no new tables / no RLS implications.

Once the Stop-button routing is fixed I'm comfortable with this landing as a flag-gated experimental path. Happy to keep iterating with you on the un-flag prerequisites.

Your Name and others added 4 commits May 23, 2026 16:37
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…tarts

Co-authored-by: Cursor <cursoragent@cursor.com>
@bdunncompany

Copy link
Copy Markdown
Collaborator

@Emilien-Etadam — read through this carefully and it's a well-built PoC. Strong defensive design: tool-call rejection covers both delta.tool_calls and finish_reason === 'tool_calls', SSE parser tolerates \r\n\r\n AND \n\n separators, AbortSignal.any combines caller + timeout (Node 22 OK per .github/workflows/ci.yml NODE_VERSION='22'), and the ToolUseInHistoryError design refuses to silently lose context. Honest limitations section in the PR body. I'm not a maintainer, just leaving a fellow-contributor review for whatever it's worth:

Things that read as correct:

  • Provider isolation in apps/api/src/routes/ai.ts branches cleanly on MCP_LLM_PROVIDER === 'openai-compatible'; the Anthropic path is untouched.
  • Zod cross-field validation correctly demands BASE_URL + MODEL + API_KEY together when the flag flips, refuses to boot otherwise.
  • Stream cancellation in finally block + reader.releaseLock() releases the socket on early break.
  • 6-min timeout matches the Anthropic path.
  • Lazy singleton for OpenAISessionManager avoids paying construction cost on the default-Anthropic path.

Worth flagging (small):

  1. computeCostUsd returns 0 when prices default to 0, which is the default per the Zod schema. That makes budget enforcement on the cost-tracker path silently a no-op on this provider unless the operator explicitly sets the price env vars. Probably worth a startup-time log warning when MCP_LLM_PROVIDER === 'openai-compatible' AND both prices are 0, so the operator doesn't get surprised by an unbounded burn.

  2. apiKey: cfg.MCP_LLM_API_KEY ?? '' in the route adapter. The Zod cross-field validation already requires the API key when the flag is set, so the ?? '' fallback is dead code AND mildly dangerous — if validation is somehow bypassed, an empty bearer (Authorization: Bearer ) goes out. Safer to use a non-null assertion (cfg.MCP_LLM_API_KEY!) so type-system enforcement matches runtime enforcement.

  3. typeof choice.delta?.content === 'string' — some OpenAI-compat backends return content as an array (multipart). vLLM doesn't today, but worth a one-line comment saying // multipart content is currently silently dropped — vLLM doesn't emit it; revisit if a future backend does.

  4. No retry on transient network error. If the vLLM container restarts mid-stream the session breaks with no retry. Acceptable for PoC; worth noting in the limitations section if it's not already.

Re: your open questions:

  • .devscripts/openai-smoke.ts location: I'd commit it under scripts/dev/ or similar — gitignored dev helpers tend to bit-rot. Reviewers want to be able to repro your smoke test without recreating the file from the PR description.
  • Tool-calling sequencing: the provider-neutral surface seems right long-term, but a function-calling adapter over the current tool surface gets you to a working tool path faster. If you want to keep the Anthropic flow as the source of truth, the adapter is closer to "best effort no SLA" than a parallel tool engine would be.

One unverified thing I can't speak to:

  • I haven't tested the Anthropic path either — same constraint as you, no credits handy. The diff doesn't touch the Anthropic branch, so absent runtime evidence the static analysis says it's unaffected.

Thanks for the PoC — solid foundation if it goes in.

Your Name and others added 10 commits May 23, 2026 20:27
Stop no longer yields an error event or surfaces as internal error;
non-abort stream failures still emit a typed error.
Classify combined abort by timeout vs caller signal so 6m stalls error
while Stop still ends the stream without a provider error event.
Rely on Zod-validated config instead of falling back to an empty bearer token.

Co-authored-by: Cursor <cursoragent@cursor.com>
Document that array-shaped delta.content from some backends is ignored today.

Co-authored-by: Cursor <cursoragent@cursor.com>
Move the smoke test out of ignored .devscripts into the api package sources
for repeatable runs in review.

Co-authored-by: Cursor <cursoragent@cursor.com>
Nested withDbAccessContext was skipping set_config while the auth middleware’s
ALS was still active, breaking RLS on ai_cost_usage. Mirror StreamingSessionManager.

Co-authored-by: Cursor <cursoragent@cursor.com>
Close now evicts in-memory state from OpenAISessionManager when using the
OpenAI-compatible provider, matching the Anthropic path and interrupt routing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Attach a rejection handler mirroring StreamingSessionManager’s processorPromise
.catch so stray failures surface in Sentry and logs.

Co-authored-by: Cursor <cursoragent@cursor.com>
runTurn reads committed history after runOutsideDbContext; pending user rows
were invisible — pass sanitized content separately like Anthropic pushMessage.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@Emilien-Etadam

Copy link
Copy Markdown
Contributor Author

Pushed the fixes — all four blockers plus the Stop/timeout cleanup are done,
and testing with the breeze_app role active surfaced two RLS/isolation issues
that are now fixed too. Full breakdown and the open questions are in the
updated description above.

@bdunncompany thanks for the careful pass — your nits are all addressed
(API key assertion, multipart comment, smoke script versioned, zero-price
startup warning).

No rush on review.

@bdunncompany

Copy link
Copy Markdown
Collaborator

@Emilien-Etadam thanks for the careful follow-up. Walked the 14 new commits, they map cleanly to the asks:

Blockers — all 4 addressed:

  • d267c7d8 Stop → correct manager
  • 24198c87 plan/pause/abort → 501 NOT_SUPPORTED_ON_PROVIDER
  • a9f534c1 dead message_start yield removed
  • dd39576d turnCount finally-block

Robustness — Stop vs timeout distinction (8c1fd79b + d3db1107), DELETE session routing (c0a4273c), and background runTurn rejection handler (b695423a) all look right.

The two new RLS fixes are the most valuable adds from this round, and the fact that they only surfaced once you tested with breeze_app active is exactly the kind of pre-merge catch that a flagged scope is for:

  • 5d1e643b — moving the async turn outside the request ALS so post-stream writes to ai_cost_usage don't inherit the request transaction context. Mirrors the Anthropic path's runOutsideDbContextSafe pattern.
  • 4be5b465 — passing the sanitized user message in-memory so the just-inserted-still-uncommitted user row isn't required for history rebuild. The invariant comment in historyBuilder is the right shape.

Nits — all four landed: API key non-null assertion, multipart drop comment, smoke script versioned path, zero-price startup warning.

LGTM from a fellow contributor. Leaving the merge decision to @ToddHebebrand; flagging that the "known limitations" list in the description is honest about what's deferred (provider unit tests, mid-stream retry, BASE_URL allowlist, idle eviction DB drift) so reviewers can size the follow-up issues separately from the merge gate.

No rush on my end either — this is solid groundwork for #505 phase 1.

ToddHebebrand and others added 2 commits June 11, 2026 16:07
Commits the approved 2026-06-07 brainstorm spec for per-process resource
drill-down on the device Performance screen, and the living e2e coverage
index that the e2e-coverage skill reads/appends per sweep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ToddHebebrand ToddHebebrand marked this pull request as ready for review June 11, 2026 22:28
Resolves the validate.ts / validate.test.ts overlap: the MCP_LLM_* config
block independently landed on main via LanternOps#864, so validate.ts takes main's
version wholesale; validate.test.ts keeps both this PR's MCP_LLM config
tests and main's newer test blocks.
getConfig() throws before validateConfig() runs, and route unit tests never
validate config — so every guard added for the openai-compatible path threw
500s in ai route tests (latent since the PR ran as a draft with no CI).
isOpenAICompatibleProvider() treats an unvalidated config as the default
anthropic path; production validates at boot, so behavior there is unchanged.
@ToddHebebrand ToddHebebrand merged commit 814f422 into LanternOps:main Jun 11, 2026
31 checks passed
@ToddHebebrand

Copy link
Copy Markdown
Collaborator

@Emilien-Etadam — first, an apology. You addressed every blocker on May 24, bdunncompany confirmed the fixes commit-by-commit the same day, and then this sat for 18 days with the ball squarely in my court. That's on me: my review queue kept bucketing it as an author-blocked draft, and the "merge decision to Todd" handoff in the comments never overrode the draft flag. I've fixed that triage rule so a thread handoff outranks draft status.

Final review: nothing new owed on the code. A fresh pass against today's main (which has moved a lot since your base — AI tool site-scope enforcement, PAM tool-action flows, M365 agent) came back clean: those features gate tool execution, which this path doesn't do by design, and recordOpenAIUsage still matches the current cost/billing chokepoints exactly.

Merge prep I pushed to your branch (hope you don't mind — wanted to spare you another round after the wait): marked ready-for-review and pushed two commits. 8311931 merges main in — the one real conflict was self-inflicted on our side: the MCP_LLM_* config block landed on main independently via #864 (f891253e), so apps/api/src/config/validate.ts takes main's version wholesale and validate.test.ts keeps both your config tests and main's newer blocks. 22784d1 fixes one latent bug the first-ever CI run on this PR surfaced (drafts here never ran CI, so nobody could have seen it): the provider guards call getConfig() inside route handlers, which throws before validateConfig() runs — route unit tests never validate config, so the pause/approve/reject guards 500'd eleven ai-route tests. The fix is a tolerant isOpenAICompatibleProvider() that treats an unvalidated config as the default anthropic path; production validates at boot, so no behavior change there. 173 tests green locally plus a clean typecheck on the merged tree.

Follow-up scope stays as the honest known-limitations list in your description (provider unit tests, mid-stream retry, BASE_URL allowlist, idle-eviction DB drift), plus one my re-review added: single-turn budget overshoot — usage is recorded after the turn and checkBudget only gates the next turn's preflight, so one openai-compatible turn can exceed the remaining budget. All acceptable for an off-by-default flagged path; we'll size them as issues when this graduates from PoC.

Merging now that CI is green. The two RLS catches you found by testing with the breeze_app role active were exactly the kind of pre-merge find this review process exists for — genuinely good work, and a model follow-up round. Sorry again for the silence.

Status: merged (admin-squash) once CI completes; follow-up issues to be filed at graduation time.

@Emilien-Etadam

Emilien-Etadam commented Jun 12, 2026 via email

Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants