Skip to content

Fix AIC extraction and computation from token usage JSONL files#40085

Closed
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-ai-credits-extraction
Closed

Fix AIC extraction and computation from token usage JSONL files#40085
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-ai-credits-extraction

Conversation

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

AIC values were silently reporting as 0 in workflow footers and OTEL telemetry due to several extraction bugs, compounded by an api-proxy v0.27.4 regression that stopped writing token-usage.jsonl for Copilot API calls.

Fixes

  • Honor ai_credits_this_response (parse_mcp_gateway_log.cjsparseTokenUsageJsonl): api-proxy v0.27.2 emits this explicit per-request AIC field; we were ignoring it and recomputing from tokens instead. Now uses it directly when present.

  • Check both token-usage.jsonl paths (parse_mcp_gateway_log.cjswriteStepSummaryWithTokenUsage): Previously only read the primary firewall path; now reads both primary and audit paths with request_id-based deduplication, matching the strategy already used by parse_token_usage.cjs.

  • Write agent_usage.jsonl alongside .json (parse_token_usage.cjs): send_otlp_span.cjs reads agent_usage.json; the usage artifact collector copies agent_usage.jsonl. Extension mismatch meant usage data never landed in artifacts.

  • Recognize ai_credits_this_response in daily aggregation (daily_aic_workflow_helpers.cjssumAICFromUsageJSONLFiles): Added the per-response field alongside the existing ai_credits / aiCredits / aic aliases.

The root cause — api-proxy v0.27.4 no longer writing token-usage.jsonl for Copilot calls — requires an upstream fix. These changes make the AWF side as resilient as possible with the available data.


pr-sous-chef: update branch requested from run https://github.com/github/gh-aw/actions/runs/27780036106

Generated by 👨‍🍳 PR Sous Chef ·

Copilot AI and others added 2 commits June 18, 2026 14:57
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan June 18, 2026 14:59
@pelikhan pelikhan marked this pull request as ready for review June 18, 2026 15:05
Copilot AI review requested due to automatic review settings June 18, 2026 15:05
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #40085 does not have the 'implementation' label and has 0 new lines of code in business logic directories (≤100 threshold).

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Ready to approve

The changes are cohesive, align with the stated root cause/resilience goals, and are backed by targeted unit tests for the new AIC extraction and deduplication behavior.

Note: this review does not count toward required approvals for merging.

Pull request overview

This PR fixes AI Credits (AIC) reporting by making token-usage parsing resilient to api-proxy output differences, preferring proxy-emitted per-request AIC when available, and ensuring the expected usage artifacts are written for downstream collectors/telemetry.

Changes:

  • Prefer ai_credits_this_response when present and compute totals by summing per-entry AIC to keep per-request and aggregated views consistent.
  • Read token usage from both primary and audit token-usage.jsonl paths with request_id-based deduplication when exporting GH_AW_AIC.
  • Write agent_usage.jsonl alongside agent_usage.json, with tests covering the new path and artifact emission behavior.
File summaries
File Description
actions/setup/js/parse_token_usage.cjs Writes agent_usage.jsonl in addition to agent_usage.json so artifact collectors can find usage data consistently.
actions/setup/js/parse_token_usage.test.cjs Adds/updates tests to validate the new AGENT_USAGE_JSONL_PATH constant and that the .jsonl file is written.
actions/setup/js/parse_mcp_gateway_log.cjs Fixes AIC extraction by honoring ai_credits_this_response, sums totals from per-entry AIC, and reads/dedupes token usage across primary + audit paths.
actions/setup/js/parse_mcp_gateway_log.test.cjs Adds tests for audit-path fallback, cross-path deduplication, and explicit per-response AIC handling.
actions/setup/js/daily_aic_workflow_helpers.cjs Updates daily aggregation to recognize ai_credits_this_response as an explicit AIC source before falling back to token-based computation.

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 0

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions Bot mentioned this pull request Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 76/100 — Acceptable

Analyzed 7 test(s) across 2 JavaScript test files: 6 design, 1 implementation, 0 guideline violations. Test inflation detected (both files exceed the 2:1 lines-added ratio).

📊 Metrics & Test Classification (7 tests analyzed)
Metric Value
New/modified tests analyzed 7
✅ Design tests (behavioral contracts) 6 (86%)
⚠️ Implementation tests (low value) 1 (14%)
Tests with error/edge cases 5 (71%)
Duplicate test clusters 0
Test inflation detected YES — parse_mcp_gateway_log.test.cjs (+198 lines vs +66 prod = 3.0×); parse_token_usage.test.cjs (+20 lines vs +5 prod = 4.0×)
🚨 Coding-guideline violations 0
Test File Classification Issues Detected
exports GH_AW_AIC from audit path when primary token-usage.jsonl is absent parse_mcp_gateway_log.test.cjs:482 ✅ Design
deduplicates entries shared between primary and audit token-usage.jsonl paths parse_mcp_gateway_log.test.cjs:558 ✅ Design
uses ai_credits_this_response field when present instead of computing from tokens parse_mcp_gateway_log.test.cjs:1881 ✅ Design
sums explicit ai_credits_this_response across multiple entries parse_mcp_gateway_log.test.cjs:1904 ✅ Design
falls back to computed AIC when ai_credits_this_response is absent parse_mcp_gateway_log.test.cjs:1918 ✅ Design
AGENT_USAGE_JSONL_PATH points to agent_usage.jsonl parse_token_usage.test.cjs:56 ⚠️ Implementation Tests a constant string value only
writes agent_usage.json... (extended) parse_token_usage.test.cjs:261 ✅ Design

JavaScript: 7 (*.test.cjs); Go: 0.

⚠️ Flagged Tests — Requires Review (1 issue)

AGENT_USAGE_JSONL_PATH points to agent_usage.jsonl (parse_token_usage.test.cjs:56) — ⚠️ Implementation: asserts that a path constant equals a hardcoded string ("/tmp/gh-aw/agent_usage.jsonl"). This tests an internal configuration detail rather than observable behavior. Suggested improvement: this pattern follows an existing convention in the file (e.g. the AGENT_USAGE_PATH test), so it is low risk; consider whether a higher-value test verifying the file is actually written to that path would be more meaningful.

Test inflation note: both test files show > 2:1 lines-added ratio vs production. This is expected here — the PR adds significant behavioral coverage for new features (audit-path fallback, deduplication, explicit AIC field priority) that require integration-style tests with real temp files and mocked GitHub Actions runtime. No action required.

Verdict

Check passed. 14% implementation tests (threshold: 30%). The five new design tests cover behavioral contracts clearly: audit-path fallback, deduplication by request_id, explicit-field priority over computed AIC, multi-entry summation, and fallback to computed AIC. Mocking targets (fs, global.core) are legitimate external I/O and GitHub Actions runtime. The one implementation test (constant path check) is a minor, conventional pattern already present in this file.

🧪 Test quality analysis by Test Quality Sentinel ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 76/100. Test quality is acceptable — 14% of new tests are implementation tests (threshold: 30%).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — requesting changes on two correctness issues.

📋 Key Themes & Highlights

Key Issues

  • Priority inversion (daily_aic_workflow_helpers.cjs): ai_credits_this_response is checked last in the alias chain, so legacy ai_credits wins when both fields are present. This silently defeats the fix for daily aggregation (inline comment flagged at line 218).
  • > 0 guard discards explicit zero (both files): proxy-emitted ai_credits_this_response: 0 falls through to token-computed AIC instead of being honoured as an authoritative zero-cost assertion.
  • Missing test coverage for sumAICFromUsageJSONLFiles: the new ai_credits_this_response path in the daily helpers is not exercised by any test.

Positive Highlights

  • ✅ The per-entry AIC refactor in parseTokenUsageJsonl is clean — moving from model-aggregate-first to per-entry-first is the right semantic model and handles mixed explicit/computed entries correctly.
  • ✅ Strong test surface on parse_mcp_gateway_log.test.cjs: audit-only, both-paths-dedup, explicit-field, and fallback cases are all covered.
  • ✅ Dual-path read with request_id deduplication mirrors the strategy already used by parse_token_usage.cjs — good consistency.
  • ✅ Writing agent_usage.jsonl alongside .json closes a clean data-loss path with minimal risk.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer

continue;
}
// Prefer proxy-emitted per-request AIC over locally computed when present.
const explicitPerRequest = getNumericAliasField(usage, parsed, ["ai_credits_this_response"]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The inline comment says "Prefer proxy-emitted per-request AIC over locally computed" but ai_credits_this_response is checked last — after ai_credits, aiCredits, and aic. If a record contains both ai_credits and ai_credits_this_response (api-proxy v0.27.2 emits both), the older ai_credits field wins here, defeating the fix for daily aggregation.

💡 Suggested ordering fix

Move ai_credits_this_response to be checked before the legacy aliases, consistent with how parseTokenUsageJsonl in parse_mcp_gateway_log.cjs already treats it as the highest-priority field:

// Highest fidelity: proxy-emitted per-request AIC.
const explicitPerRequest = getNumericAliasField(usage, parsed, ["ai_credits_this_response"]);
if (explicitPerRequest > 0) {
  total += explicitPerRequest;
  continue;
}
// Legacy aggregate aliases (lower priority).
const explicitAICredits = getNumericAliasField(usage, parsed, ["ai_credits", "aiCredits"]);

This keeps daily totals consistent with the step-summary values already reported by parseTokenUsageJsonl.

const durationMs = entry.duration_ms || 0;
// When the proxy emits an explicit per-request AIC value, prefer it over
// the locally-computed value so that proxy-side pricing updates take effect.
const explicitDeltaAIC = typeof entry.ai_credits_this_response === "number" && entry.ai_credits_this_response > 0 ? entry.ai_credits_this_response : null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The > 0 guard silently discards an explicit ai_credits_this_response: 0 emitted by the proxy, falling through to locally-computed AIC instead. A value of 0 could be a legitimate proxy assertion (e.g., a cached or free-tier call), and overriding it with a computed cost produces an incorrect result.

💡 Suggested fix

Drop the > 0 guard — treat any numeric value from the proxy as authoritative, including zero:

const explicitDeltaAIC = typeof entry.ai_credits_this_response === "number"
  ? entry.ai_credits_this_response
  : null;

The ?? computed fallback already handles null correctly, so this change only affects the explicit-zero case.

}
// Prefer proxy-emitted per-request AIC over locally computed when present.
const explicitPerRequest = getNumericAliasField(usage, parsed, ["ai_credits_this_response"]);
if (explicitPerRequest > 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] Same > 0 guard issue as in parse_mcp_gateway_log.cjs: an explicit ai_credits_this_response: 0 from the proxy is silently discarded and falls through to token-based computation. Should use a presence check (!== undefined && !== null) rather than a magnitude check.

total += explicitAIC;
continue;
}
// Prefer proxy-emitted per-request AIC over locally computed when present.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The new ai_credits_this_response branch in sumAICFromUsageJSONLFiles has no test coverage. The existing test in check_daily_aic_workflow_guardrail.test.cjs exercises ai_credits, aiCredits, and aic aliases but not this new field — so a future regression here would go undetected.

💡 Suggested test addition (check_daily_aic_workflow_guardrail.test.cjs)
it("sums ai_credits_this_response when present", () => {
  const tmpDir = fs.mkdtempSync(...);
  const filePath = path.join(tmpDir, "usage.jsonl");
  fs.writeFileSync(
    filePath,
    [
      JSON.stringify({ ai_credits_this_response: 3.5 }),
      JSON.stringify({ ai_credits_this_response: 5.25 }),
    ].join("\n")
  );
  expect(exports.sumAICFromUsageJSONLFiles([filePath])).toBeCloseTo(8.75);
});

// parse_token_usage.cjs. The pattern covers standard JSON escaping (\\, \") but not
// exotic cases; api-proxy request IDs are UUIDs so this is sufficient in practice.
const idMatch = line.match(/"request_id"\s*:\s*"((?:\\.|[^"\\])*)"/);
const dedupeKey = idMatch ? `request_id:${idMatch[1]}` : line;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] When request_id is absent, the full raw line is used as the dedupe key. If the proxy serializes the same logical record to the primary and audit paths with different JSON field ordering (which is valid), both copies will have different keys and both will be counted — producing a doubled AIC total.

The comment acknowledges "api-proxy request IDs are UUIDs so this is sufficient in practice", which is reasonable given current behavior. Worth adding an explicit // TODO noting the field-order risk so it's easy to find if double-counting is ever reported.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: AIC extraction fixes — non-blocking observations

The overall approach is sound: honoring ai_credits_this_response, deduplicating across both token-usage paths, and writing agent_usage.jsonl alongside .json all address real gaps. Four non-blocking issues worth addressing before this pattern propagates.

📋 Issue summary
# File Severity Issue
1 parse_mcp_gateway_log.cjs:210 Medium Audit path processed first — "first-seen wins" dedup makes audit data take priority over the primary path, contradicting the "fallback" description
2 parse_mcp_gateway_log.cjs:91 Medium ai_credits_this_response > 0 guard treats zero-cost responses as absent, falling back to computed AIC and overcounting
3 daily_aic_workflow_helpers.cjs:218 Medium Comment says "prefer over locally computed" but ai_credits_this_response is actually only checked after both ai_credits/aiCredits and aic return zero; inconsistent with how parse_mcp_gateway_log.cjs handles the same field
4 parse_token_usage.cjs:177 Low Non-atomic double write to .json + .jsonl; a failure on the second write leaves the two files silently diverged

🔎 Code quality review by PR Code Quality Reviewer

// Read from both the primary path and the audit path, deduplicating by request_id.
// The audit path may contain additional entries when the primary path is absent or
// partially written (e.g. the proxy was restarted mid-run).
const paths = [TOKEN_USAGE_AUDIT_PATH, TOKEN_USAGE_PATH];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dedup gives audit-path data priority over the primary path — first-seen wins, so audit entries shadow primary entries when the same request_id appears in both.

💡 Details
const paths = [TOKEN_USAGE_AUDIT_PATH, TOKEN_USAGE_PATH];

The docstring calls TOKEN_USAGE_AUDIT_PATH a fallback (checked

// Read from both the primary path and the audit path, deduplicating by request_id.
// The audit path may contain additional entries when the primary path is absent or
// partially written (e.g. the proxy was restarted mid-run).
const paths = [TOKEN_USAGE_AUDIT_PATH, TOKEN_USAGE_PATH];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dedup gives audit-path data priority over the primary path — first-seen wins, so the audit copy shadows the primary copy when the same request_id appears in both files.

💡 Details
const paths = [TOKEN_USAGE_AUDIT_PATH, TOKEN_USAGE_PATH];

The file-list docstring calls TOKEN_USAGE_AUDIT_PATH "checked as fallback", but the dedup loop processes it first. The first occurrence of a request_id wins, so if both files carry the same entry the audit version is the one kept.

In normal operation both files should be byte-identical for the same request, so there is no practical difference. But if the primary path ever has a corrected/re-emitted record for a request_id that already appeared in the audit log (e.g. a retry with updated credits), the primary version is silently discarded.

Either:

  1. Swap the order to [TOKEN_USAGE_PATH, TOKEN_USAGE_AUDIT_PATH] so the primary is authoritative and the audit is the true fallback, or
  2. Keep the current order but update the comment to explain why audit wins (e.g. "audit is written first and is more complete when the primary is partially flushed").

parse_token_usage.cjs uses the same [AUDIT, PRIMARY] order, so if this is intentional the rationale should be documented in both files.

const durationMs = entry.duration_ms || 0;
// When the proxy emits an explicit per-request AIC value, prefer it over
// the locally-computed value so that proxy-side pricing updates take effect.
const explicitDeltaAIC = typeof entry.ai_credits_this_response === "number" && entry.ai_credits_this_response > 0 ? entry.ai_credits_this_response : null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ai_credits_this_response: 0 is treated as absent, causing AIC to be overcounted for zero-cost responses.

💡 Details
const explicitDeltaAIC =
  typeof entry.ai_credits_this_response === "number" && entry.ai_credits_this_response > 0
    ? entry.ai_credits_this_response
    : null;

The > 0 guard cannot distinguish "field not present" from "field is present and explicitly zero". When a proxy emits "ai_credits_this_response": 0 (e.g. a fully-cached, free-tier, or failed request), explicitDeltaAIC is null and the fallback computeInferenceAIC runs, returning a positive value based on token counts. The proxy's explicit assertion that the request cost nothing is ignored and AIC is overcounted.

Fix: separate presence detection from value filtering:

const rawExplicit = entry.ai_credits_this_response;
const explicitDeltaAIC =
  typeof rawExplicit === "number" ? rawExplicit : null;

Then in the aggregation pass:

entry.deltaAIC = entry.explicitDeltaAIC !== null ? entry.explicitDeltaAIC : computed;

The same > 0 pattern appears in daily_aic_workflow_helpers.cjs line 219 for the same field and should be fixed there too.

continue;
}
// Prefer proxy-emitted per-request AIC over locally computed when present.
const explicitPerRequest = getNumericAliasField(usage, parsed, ["ai_credits_this_response"]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading comment: ai_credits_this_response is not preferred over locally-computed — it is only checked when ALL legacy explicit fields are absent.

💡 Details

The comment on line 217 says "Prefer proxy-emitted per-request AIC over locally computed when present", which implies ai_credits_this_response takes priority. It doesn't. The actual priority chain is:

  1. ai_credits / aiCredits — if > 0, use it and continue
  2. aic — if > 0, use it and continue
  3. ai_credits_this_responseonly reached if the two checks above both returned 0
  4. computeInferenceAIC fallback

If a transitional proxy record happens to emit both ai_credits and ai_credits_this_response, the legacy field wins and the more-authoritative per-request value from ai_credits_this_response is silently discarded. This is inconsistent with how parse_mcp_gateway_log.cjs handles the same data (it checks ai_credits_this_response directly and ignores ai_credits entirely).

If the intent is backward compatibility (old proxy → ai_credits, new proxy → ai_credits_this_response), the comment should say so. If ai_credits_this_response should truly take precedence over ai_credits, move it before the legacy checks.

};
fs.writeFileSync(AGENT_USAGE_PATH, JSON.stringify(agentUsage) + "\n");
// Also write the .jsonl path so the usage artifact collector can find it.
fs.writeFileSync(AGENT_USAGE_JSONL_PATH, JSON.stringify(agentUsage) + "\n");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-atomic double write: if the second writeFileSync fails, .json and .jsonl diverge silently.

💡 Details
fs.writeFileSync(AGENT_USAGE_PATH, JSON.stringify(agentUsage) + "\n");
// Also write the .jsonl path so the usage artifact collector can find it.
fs.writeFileSync(AGENT_USAGE_JSONL_PATH, JSON.stringify(agentUsage) + "\n");

If the second write throws (disk full, missing parent dir, permission error), agent_usage.json contains the current run's data but agent_usage.jsonl is either absent or holds stale data from a prior run. Both the OTEL span sender (send_otlp_span.cjs reads .json) and the usage artifact collector (reads .jsonl) will now see different data with no log entry explaining why.

At minimum, wrap the second write in a try/catch and emit coreObj.warning on failure so the discrepancy is visible in the workflow log. If both files must stay in sync, write to a temp path and fs.renameSync both atomically (rename is atomic on POSIX for same-filesystem moves).

@github-actions

Copy link
Copy Markdown
Contributor

@copilot review all comments and address the unresolved review feedback.
After that, please summarize any remaining blockers and next steps.

Generated by 👨‍🍳 PR Sous Chef ·

@github-actions

Copy link
Copy Markdown
Contributor

\n@copilot review all comments and address unresolved review feedback.\n\npr-sous-chef: fix the unresolved AIC review threads, then push an update branch.

Generated by 👨‍🍳 PR Sous Chef ·

@pelikhan pelikhan closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants