fix: improve prompt cache compatibility for custom Codex endpoints#113
fix: improve prompt cache compatibility for custom Codex endpoints#113jujinqian162 wants to merge 3 commits intoMadAppGang:mainfrom
Conversation
Attach a stable prompt_cache_key to Codex Responses payloads and send Codex-compatible session headers so prompt caching can hit consistently. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sanitize Anthropic request payloads before routing so volatile billing-header cch values do not perturb prompt cache keys, and cover the sanitizer with focused tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c73b416ebe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| function sanitizeValue(value: unknown): unknown { | ||
| if (typeof value === "string") return sanitizeString(value); | ||
| if (Array.isArray(value)) return value.map((item) => sanitizeValue(item)); |
There was a problem hiding this comment.
Restrict billing-header rewriting to non-user text
stripVolatileCacheBusters recursively sanitizes every string in the request, and sanitizeString rewrites any string that merely contains x-anthropic-billing-header. In /v1/messages and /v1/messages/count_tokens, this can alter user prompt content (for example, debugging prompts that include x-anthropic-billing-header: ...; cch=...) before forwarding upstream, which is a request-corruption regression for those inputs. Sanitization should target only the injected billing-header metadata location rather than arbitrary message text.
Useful? React with 👍 / 👎.
Restrict billing-header rewriting to injected system text so user-authored request content is preserved while volatile cache-buster fields are still normalized. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
When using a third-party Codex-compatible endpoint via an overridden OpenAI Codex base URL, prompt cache
hit rates were much lower than when talking to Codex directly.
This change improves cache compatibility for that setup by:
prompt_cache_keyto Codex Responses API requestsupstream
Background
During investigation, one source of instability was request metadata that could vary across otherwise
equivalent requests.
In particular, Claude Code injects an
x-anthropic-billing-headerstring that may include:cch=...fieldcc_versionvalue than the upstream service appears to need for cache reuseThose values are useful for billing and client attribution, but they can also make semantically identical
requests look different at the payload level. For third-party Codex-compatible backends, that reduces the
chance of matching the cache behavior seen when using Codex directly.
What changed
To make request fingerprints more stable:
cch=...entries are removed from routed Anthropic payloadscc_versionvalues are normalized to a shorter stable form instead of preserving extra patch/buildsuffixes
prompt_cache_keyTogether, these changes make repeated requests look closer to the direct Codex path and reduce avoidable
cache misses caused by transport-level request variation.
Testing
prompt_cache_keyand Codex-compatible headersbun test packages/cli/src/providers/transport/openai.test.ts packages/cli/src/request-sanitizer.test.ts