fix(openai-compat): move tool schemas to system prompt to eliminate per-turn latency spikes#43
Merged
Enderfga merged 2 commits intoEnderfga:mainfrom Apr 16, 2026
Conversation
…er-turn latency spikes
Currently when a /v1/chat/completions request includes `tools`, the proxy
prepends a `<available_tools>` block to EVERY user message. For callers
with many tools (e.g. OpenClaw gateway routing 90+ MCP tools), this block
can be 50+ KB and is sent on every turn.
This causes a reproducible pattern of 30-50s latency spikes every ~4 calls
against otherwise-warm sessions. Isolated via 4-layer bisection:
layer | 12 calls | spikes
------------------------------------------|-------------|-------
Raw `session-send` (tiny message) | 1.8-5.2s | 0/12
Raw `session-send` (54KB message) | 5-45s | 3/10
Direct /v1/chat/completions, 3 tools | 2.4-6.4s | 0/12
Direct /v1/chat/completions, 93 tools | repro'd | ~1/4
The 54 KB tool block is the trigger; the CLI subprocess hits periodic
slow paths (likely Anthropic prompt-cache miss + full re-tokenization)
when every user message carries it.
Fix: when `engine === 'claude'` and tools are provided, embed the
<available_tools> block in the session system prompt at session-create
time (via `--system-prompt`). User messages then stay small and stable,
letting Anthropic's prompt cache reliably hit the tool block.
This required one supporting change:
- `resolveSessionKey` now fingerprints the tool list (tool names +
description prefixes) alongside the system prompt. A caller swapping
tool lists mid-conversation now lands in a new session instead of
reusing a stale one whose system prompt was baked with the old tools.
Behavior for non-claude engines (Codex, Gemini, Cursor) is unchanged —
they still receive the tool block per turn because their CLIs are
spawned fresh per message with no persistent system prompt.
Behavior change — opt-out for callers who mutate tool lists mid-session:
Set `OPENAI_COMPAT_TOOLS_PER_MESSAGE=1` to restore the pre-fix behavior
of injecting the tool block into every user message and keying sessions
only by system prompt + model. Use this if you have callers that
dynamically change their tool list within a single conversation and rely
on continuing history across tool changes. The default (env unset) uses
the new system-prompt injection and eliminates latency spikes.
Measured impact (OpenClaw gateway + nora-oc agent, 93 HA/anomaly-rules
tools, 15-call bench, Anthropic streaming):
before: 7-10s warm, 40-50s spike every ~4 calls
after (default): 3.0-3.9s warm, 0 spikes in 30+ calls
after (OPENAI_COMPAT_TOOLS_PER_MESSAGE=1): matches pre-fix behavior (spikes restored)
Tests: all 410 existing tests pass. Four new tests verify:
- resolveSessionKey produces distinct keys when same system prompt has
different tool lists (default mode)
- resolveSessionKey is deterministic for identical tool lists
- OPENAI_COMPAT_TOOLS_PER_MESSAGE=1 collapses all tool variants to one
session key (legacy opt-out mode)
- isToolsPerMessageModeEnabled() correctly parses env values
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4cd812e to
ac7350a
Compare
…add tests - Extract `noToolsSystemPrompt(location)` factory to eliminate near-duplicate system prompt strings that differed by one phrase and would drift over time - Extract `buildSessionSystemPrompt()` as an exported, testable helper that encapsulates the default vs legacy system prompt construction logic - Simplify the session create block in handleChatCompletion to a single call - Add tests for noToolsSystemPrompt, buildSessionSystemPrompt (default + legacy modes, with and without caller system prompt) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a
/v1/chat/completionsrequest includestools, the proxy currently prepends an<available_tools>block to every user message. For callers with many tools (e.g. OpenClaw gateway routing 90+ MCP tools from Home Assistant), this block can be 50+ KB and is sent on every turn.This causes a reproducible pattern of 30–50 second latency spikes every ~4 calls against otherwise-warm sessions. The fix moves the tool block into the session system prompt at session-create time, so user messages stay small and Anthropic's prompt cache can reliably hit the tool definitions.
A fully backward-compatible opt-out env var (
OPENAI_COMPAT_TOOLS_PER_MESSAGE=1) preserves the pre-fix behavior for callers who need to mutate their tool list within a single session.Repro / Bisection
Using OpenClaw gateway (openclaw/openclaw) routed through
claude-code-skill serveon port 18796, an agent with 93 MCP tools (home-assistant + anomaly-rules) on claude-sonnet-4-6:Layer bisection (independent reproduction)
claude-code-skill session-send(tiny message)session-sendwith a 54 KB user message/v1/chat/completionsPOST, 3-tool payload/v1/chat/completionsPOST, 93-tool payloadThe trigger is the 54 KB tool block, not the number of sessions or proxy versus direct invocation. A raw
session-sendwith a tiny message never spikes; the same CLI with a 54 KB user message does. The proxy adds one such block to every user turn.MITM trace of a spike
Captured with a small HTTP MITM between OpenClaw gateway and the proxy:
firstTimings=[9, 30010, 44531]shows the second SSE chunk arrives at precisely 30 010 ms — that's thesetInterval(..., 30_000)keepalive comment atopenai-compat.ts:562, firing because the CLI produced zero output for 30 s. The real content chunk only arrives at 44.5 s when the CLI finally responds.After the fix (same config, 15-call bench, default mode)
Warm gateway time: 3.0–4.6 s, median ~3.3 s. Zero spikes across 30+ calls in two independent 15-call runs.
With opt-out (
OPENAI_COMPAT_TOOLS_PER_MESSAGE=1)Legacy behavior is faithfully restored:
The fix
1. Move
<available_tools>from user message → session system prompt (default)Before (
openai-compat.tsaround line 571–576):After:
And in the session-create block (around line 519):
(Gated by
isToolsPerMessageModeEnabled()so legacy callers can opt out.)2. Include a tool fingerprint in
resolveSessionKey(default)Without this, two callers with the same system prompt but different tool lists would share a session whose system prompt was baked with the first caller's tools. The fingerprint is a short stable hash of
toolName + descriptionPrefixjoined across the tool array, so tool changes spawn a new session:Behavior change (explicit)
Default mode (env var unset) — new behavior:
tools=[X]thentools=[X,Y]in the same conversation now gets two separate sessions, losing conversation history across the tool change. This is a change from prior behavior where the same session was reused and the new tool list was silently re-injected per turn.Legacy mode (
OPENAI_COMPAT_TOOLS_PER_MESSAGE=1) — pre-fix behavior:Behavior preserved
<tool_calls>response parsing unchanged — the model still sees<available_tools>and emits<tool_calls>tags the same way.appendSystemPromptpath).bufferedTextparsing — unchanged.X-Session-Reset/isNewConversationephemeral session cleanup — unchanged.Tests added
Four new tests in
src/__tests__/openai-compat.test.ts:resolveSessionKey(default mode):resolveSessionKey(opt-out mode):OPENAI_COMPAT_TOOLS_PER_MESSAGE=1collapses all tool variants to one session keyisToolsPerMessageModeEnabled:1,true,yes(case-insensitive, trimmed) as enabled; everything else as disabledTest plan
npm test— all 414 tests pass (4 new, 410 existing)npm run build— clean TypeScript build/opt/homebrew/lib/node_modules/@enderfga/openclaw-claude-code/dist/src/openai-compat.jsclaude-code-skill serveprocessuserMessage.length=36instead of ~54 000)🤖 Generated with Claude Code