feat: token-receipt accuracy rework + 2026-05-09 follow-ups by kingdoooo · Pull Request #2 · Hchen1218/token-receipt

kingdoooo · 2026-05-09T06:54:50Z

Summary

This PR bundles two waves of work that accumulated on the Kent/kingdoooo fork over the past week. Numbers and commands below are all pulled from CHANGELOG.md entries ## 2026-05-08 and ## 2026-05-09.

Wave 1 — Accuracy & UX rework (2026-05-08, 18 commits)

Fixes a 50-70× USD under-reporting bug and adds Bedrock / aggregator / PARTIAL / scope / hide-fields / silent-write features. Key outcomes:

`TOTAL` now includes cache read and cache write tokens (previously undercounted on cache-heavy sessions).
`USD ESTIMATE` bills cached input and cache write at their own rates (previously dropped them).
`PROVIDER`/`MODEL` auto-detect AWS Bedrock from `CLAUDE_CODE_USE_BEDROCK=1` or `.anthropic.*[1m]` model strings.
New `--cache-ttl {auto,5m,1h}` flag for Anthropic cache-write rate selection.
New `--hide-fields supplier,model,context,price-mapping,price-date,rate-note` for surgical row removal.
New `--scope today` and `--scope session-all` for Claude Code cross-jsonl aggregation.
New `PARTIAL` pricing status — rendered as `USD ESTIMATE*` + `PRICE: PARTIAL` when the rate table is incomplete.
`--write` is now fully silent; `--write --write-html` prints exactly two `wrote to:` lines.
Pricing, Bedrock normalization, and Claude aggregation live in new single-purpose modules under `token_receipt/`.
Test scaffold (`tests/`): unittest discover + a comprehensive `scripts/validate_receipt.py` smoke script.

Full rationale and decision log in `docs/superpowers/specs/2026-05-08-token-receipt-accuracy-design.md` + `docs/superpowers/plans/2026-05-08-token-receipt-accuracy/`.

Wave 2 — 2026-05-09 follow-ups (3 commits)

Three narrow fixes surfaced during per-part code review that were out of scope for any specific Wave 1 part:

FU-1 — Bedrock `claude-N-M-` normalization (`fix(bedrock)`)
APAC Bedrock Haiku 3.5 (`apac.anthropic.claude-3-5-haiku`) was normalizing to `claude-3-5-haiku`, which missed the `claude-3.5-haiku` entry in `references/pricing.json`. Users saw `PROVIDER: AWS BEDROCK` but `USD ESTIMATE: UNMAPPED`. Added `_MAJOR_MINOR_SLUG_RE` as a second branch after the existing `_MAJOR_MINOR_DASH_RE`; the original regex (which handles `claude--N-M` for e.g. `claude-opus-4-7 → claude-opus-4.7`) is left untouched.

FU-2 — Widen Claude Code session-id env lookup (`fix(data)`)
Claude Code actually exports `CLAUDE_CODE_SESSION_ID`, not `CLAUDE_SESSION_ID`. `runtime_claude_session_id` now checks the new name first and falls back to the legacy one. `--scope session`/`session-all` used to fail with "sessionId not set" inside real Claude Code sessions; now works. `Optional[str]` return contract preserved.

FU-3 — Scrub leaky env vars in in-process tests (`refactor(tests)`)
Claude Code on Bedrock dev shells export `ENABLE_PROMPT_CACHING_1H_BEDROCK` / `CLAUDE_CODE_USE_BEDROCK` / `ANTHROPIC_MODEL` / `ANTHROPIC_SMALL_FAST_MODEL`, which leak into `estimate_cost` (flips TTL resolver) and `resolve_provider_and_model` (force-rewrites provider). Subprocess-based tests already scrubbed these; the in-process `EstimateCostTest` and `ResolveSnapshotWiresInBedrockTest` did not. Extracted `LEAKY_ENV_VARS` into `token_receipt/_envguard.py` (imported by tests and `scripts/validate_receipt.py`), added `setUp` that rebuilds `os.environ` via `patch.dict(..., clear=True)`, plus a guard test pinning the 4 known leakers.

Full spec: `docs/superpowers/specs/2026-05-09-followups-after-accuracy-rework.md`.

Test plan

Clean-env test suite: `env -i PATH="$PATH" HOME="$HOME" python3 -m unittest discover -s tests -v` → 70/70 OK
Dev-shell test suite (with 4 Bedrock leakers exported): `python3 -m unittest discover -s tests -v` → 70/70 OK
Clean-env smoke: `env -i PATH="$PATH" HOME="$HOME" python3 scripts/validate_receipt.py` → `token-receipt validation passed`
Dev-shell smoke: `python3 scripts/validate_receipt.py` → `token-receipt validation passed`
FU-1 auto-detect Haiku 3.5 sanity: `env -i ... CLAUDE_CODE_USE_BEDROCK=1 ANTHROPIC_MODEL=apac.anthropic.claude-3-5-haiku python3 scripts/token_receipt.py --agent-tool claude-code --input-tokens 1000 --output-tokens 1000 --width 48` renders `AWS BEDROCK` / `claude-3.5-haiku` / `$0.004800`
Upstream `--chat-reply` (merged in during rebase) still works end-to-end

Notes for reviewers

Branch was rebased onto `Hchen1218/main` (the 3 upstream commits `6f5d353` / `507e75b` / `fa443ce` are not present in the diff; they land as the rebase base). Three files had conflicts that were resolved as additive merges: `token_receipt/cli.py` (our `--cache-ttl`/`--hide-fields`/silent-write + upstream `--chat-reply`), `SKILL.md` (file:// note merged into the `--chat-reply` section), and `scripts/validate_receipt.py` (LEAKY_ENV_VARS import alongside new codex-thread imports).
All three FUs are surgical: FU-1 is 5 LOC + 3 tests; FU-2 is 8 LOC + 3 tests; FU-3 is ~20 LOC + 1 guard test + 2 setUp blocks + 6-site import swap.
The 18 Wave 1 commits form a complete narrative when reviewed with `git log --reverse`. Specs and plans are checked in under `docs/superpowers/` for traceability — those are 100% docs, safe to skim or skip.

🤖 Generated with Claude Code

Covers all 11 reported issues: token total math, cache-aware USD estimation, 1h TTL support, Bedrock auto-detection, Claude-projects jsonl aggregation with dedup, render gating, CLI flags, and HTML link fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Remove the min(cached, input) clamp that was under-reporting sessions. Each bucket (input, cache-read, cache-write, output) is now billed at its own rate. Add resolve_cache_write_split to route cache-write tokens between 5m and 1h TTL via CLI override, per-message snapshot fields, the ENABLE_PROMPT_CACHING_1H_BEDROCK env flag, or the 5m default. Introduce compute_total summing all billable buckets, plus PARTIAL status (with partial_reasons) when required rates are missing from pricing.json; 1h falls back to 2 * input_rate per Anthropic docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The deepseek case had cached_input_tokens (500k) less than input_tokens (1M). The old clamp billed the 500k overlap as cached and only the remaining 500k at the input rate; the new math bills the full 1M at input AND the 500k at cached (per-bucket independent). Expected amount goes from $0.364000 to $0.434000. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…silent --write

…--write Adds six smoke-test cases to scripts/validate_receipt.py covering the Part 06/05/08/09 behaviors, scrubs 4 leaky env vars in run_script (same list as tests/test_cli_flags.py::_run), fixes a manual-mode TOTAL regression in load_manual_snapshot (leaving total_tokens=0 so cli.main's compute_total covers all four buckets), and prepends the 2026-05-08 CHANGELOG block.

Lock in the three open choices from the 2026-05-09 brainstorm: - FU-2: take Option A, preserve Optional[str] contract, new tests/test_data.py. - FU-3: LEAKY_ENV_VARS lives in token_receipt/_envguard.py; guard test required. - Execution order: FU-3 first, then FU-2, then FU-1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Claude Code on Bedrock dev shells export ENABLE_PROMPT_CACHING_1H_BEDROCK, CLAUDE_CODE_USE_BEDROCK, ANTHROPIC_MODEL, ANTHROPIC_SMALL_FAST_MODEL, which leak into estimate_cost (1h TTL branch) and resolve_provider_and_model (force-rewrite to aws bedrock). The subprocess scrubs in tests/test_cli_flags and scripts/validate_receipt already handled this; the in-process unit tests in tests/test_pricing_math and tests/test_bedrock did not. Add token_receipt/_envguard.py holding LEAKY_ENV_VARS, import it from both subprocess scrub sites (replacing two duplicated literals) and from setUp in EstimateCostTest and ResolveSnapshotWiresInBedrockTest. Adds a guard test that pins the four known leakers so a future trim can't silently regress the dev-shell matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…SION_ID Claude Code actually exports CLAUDE_CODE_SESSION_ID (confirmed with `env | grep SESSION` in a live Claude Code shell); the resolver was reading CLAUDE_SESSION_ID only, so --scope session/session-all raised "sessionId not set" even when a real session was live. runtime_claude_session_id now checks CLAUDE_CODE_SESSION_ID first and falls back to CLAUDE_SESSION_ID (which scripts/validate_receipt.py continues to exercise via its fixture). Optional[str] return type is preserved so _load_claude_aggregate's `if not session_id:` guard stays valid. Also updates the SystemExit message in _load_claude_aggregate to name both env vars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

_MAJOR_MINOR_DASH_RE only matched claude-<word>-N-M (e.g. claude-opus-4-7 → claude-opus-4.7), not claude-N-M-<word>. APAC Bedrock Haiku 3.5 came out as claude-3-5-haiku, which find_price missed — the receipt showed PROVIDER: AWS BEDROCK but USD ESTIMATE: UNMAPPED even though the pricing row was already present (Part 03 added cache_write_1h_per_million for it). Adds _MAJOR_MINOR_SLUG_RE as a second branch after the existing dash-RE. The first branch early-returns so the two shapes stay independent, keeping the existing normalization tests untouched. Tests cover the APAC-only case, all four region prefixes, and an end-to-end check that find_price resolves the normalized model against DEFAULT_PRICING. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The original DoD Gate 6 used `--model apac.anthropic.claude-3-5-haiku`, but that path hits the intentional `_finalize` contract where explicit `--model` is preserved verbatim (pinned by tests/test_bedrock.py:108 and scripts/validate_receipt.py:820). FU-1's normalization fix runs before `_finalize`, so the auto-detect path (Bedrock model from env, no explicit `--model`) demonstrates the fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace 11 occurrences of a hard-coded /Users/... absolute path in the followups plan's shell examples with `cd <repo-root>` and plain `git ...` commands. The absolute paths were artifacts of drafting the plan for a subagent to copy-paste verbatim; they leaked developer-local path information once the branch opened as a public PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kingdoooo and others added 22 commits May 9, 2026 12:10

docs: add token-receipt accuracy implementation plan

dd88398

test: add unittest scaffold

6efa26c

refactor: extract pricing.py from data.py (behavior-preserving)

fdbf1f1

feat(pricing): add 1h cache-write rate to anthropic entries

9cdf2a7

feat(models): add cache-write TTL split and aggregator diagnostics

4359d96

feat(bedrock): normalize provider and model for Claude Code on Bedrock

f3d3945

feat(claude): add projects-jsonl aggregator with dedup and TTL split

8b6aa25

feat(cli): add --cache-ttl, --hide-fields, today/session-all scopes, …

12ea748

…silent --write

feat(render): gate CONTEXT USED, honor --hide-fields, render PARTIAL

b1812cc

fix(skill): use file:// scheme for HTML link so chat clients open it

71c9bcc

docs(plan): implementation plan for 2026-05-09 followups PR

ae401f1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(changelog): add 2026-05-09 block for follow-up PR

9e849d9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: token-receipt accuracy rework + 2026-05-09 follow-ups#2

feat: token-receipt accuracy rework + 2026-05-09 follow-ups#2
kingdoooo wants to merge 22 commits intoHchen1218:mainfrom
kingdoooo:feat/2026-05-09-followups

kingdoooo commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kingdoooo commented May 9, 2026

Summary

Wave 1 — Accuracy & UX rework (2026-05-08, 18 commits)

Wave 2 — 2026-05-09 follow-ups (3 commits)

Test plan

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant