fix(api-proxy): write placeholder token-usage record when usage extraction fails#5251
Closed
lpcox wants to merge 2 commits into
Closed
fix(api-proxy): write placeholder token-usage record when usage extraction fails#5251lpcox wants to merge 2 commits into
lpcox wants to merge 2 commits into
Conversation
Replaces 'sleep 3' with proper health check loops that wait up to 30 seconds for proxies to be ready before running tests. Root cause: Squid/Envoy containers were not fully initialized before agent containers tried to connect, causing spurious test failures. Changes: - Squid runc test: Wait for proxy port 3128 to respond - Squid gVisor test: Wait for proxy port 3128 to respond - Envoy runc test: Wait for admin /ready endpoint - Envoy gVisor test: Wait for admin /ready endpoint - Squid perf test: Wait for proxy port 3128 to respond - Envoy perf test: Wait for admin /ready endpoint Each health check: - Retries for up to 30 seconds - Uses lightweight curl container for network checks - Shows container logs on failure - Exits with error if proxy fails to start Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A successful (2xx) LLM completion response from which no usage payload could be extracted previously produced NO line in `token-usage.jsonl` at all: `finalizeHttpTracking()` early-returned on `!normalizeUsage(usage)`. This made the request invisible to every downstream consumer (step summaries, OTEL fan-out, AI-credit aggregation) and is the root cause of the regression where Copilot calls stopped appearing in token-usage logs (see github/gh-aw#40085). Now, for completion-style endpoints (chat/completions, responses, messages, generateContent, ...), a placeholder record is written with zeroed token counts and `usage_missing: true` so the request stays observable and the extraction gap is diagnosable, while clearly marking the counts as non-measured. Non-completion traffic (e.g. /models, health checks) is still skipped to avoid noise. Behavior is unchanged when usage is present. - Add `looksLikeCompletionRequest()` path heuristic to token-parsers - Add `writeMissingUsageRecord()` and gate the no-usage branch on it - Document `usage_missing` in schemas/token-usage.schema.json - Add unit tests for the new helper and placeholder-record behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
Copilot stopped reviewing on behalf of
lpcox due to an error
June 18, 2026 15:26
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A successful (2xx) LLM completion response from which no usage payload could be extracted previously produced no line at all in
token-usage.jsonl. The hard gate is infinalizeHttpTracking():When this fires, the request becomes invisible to every downstream consumer — step summaries, OTEL fan-out, and AI-credit aggregation. This is the producer-side root cause of the regression where Copilot calls stopped appearing in token-usage logs, as called out in github/gh-aw#40085:
I verified the parsers/injection are correct for all documented payload shapes (chat-completions with
stream_options.include_usage, Responses APIresponse.completed/response.done, non-streaming JSON). The only no-usage case is a streaming chat-completions response whose final usage chunk is absent — but whatever the exact upstream trigger, the proxy silently dropping the entire request is the brittle behavior worth fixing.Fix
For completion-style endpoints (
chat/completions,responses,messages,:generateContent, …), when a 2xx response yields no usage, write a placeholder record with zeroed token counts andusage_missing: true:request_id, request counts, OTEL spans).usage_missingflag clearly marks the counts as non-measured, so consumers don't treat them as a genuine zero-cost call./models, health checks) is still skipped to avoid noise.Changes
token-parsers.js: addlooksLikeCompletionRequest()path heuristic.token-tracker-http.js: addwriteMissingUsageRecord(); gate the no-usage branch on it.schemas/token-usage.schema.json: document theusage_missingfield.onSpanEndstill fires).Testing
cd containers/api-proxy && npm test→ 1247 passed (54 suites), +20 new tests.Relationship to github/gh-aw#40085
That PR hardens the consumer (honor
ai_credits_this_response, dedup across both log paths, fix the.jsonl/.jsonartifact extension). This PR is the producer-side fix it asks for: the api-proxy will now always emit a row per completion request, so a row can never silently vanish again.