Skip to content

Claude cost/token totals ~3x too high due to per-file deduplication missing subagent duplicates #646

@enzonaute

Description

@enzonaute

parseClaudeFile() tracks seen messageId:requestId pairs in a local seenKeys set that resets for every file. This works fine for deduplicating streaming chunks within a single JSONL, but it doesn't catch duplicates across files.

Claude Code writes subagent messages to both the parent session JSONL and a separate file under <session>/subagents/. The entries share the same messageId:requestId and identical usage values — they're the same API call logged twice.

Because seenKeys is per-file, both copies get counted. On my machine (30 days, 538 JSONL files):

per-file dedup:  7.98B tokens  (current)
global dedup:    2.59B tokens  (correct — matches ccusage output)

That's a 3.08x overcounting. The cost is inflated proportionally.

Quick breakdown of the duplicate structure:

entries only in parent files:    4,879
entries only in subagent files:  4,833  (unique, must be counted)
entries in both:                 5,127  (duplicates, should be counted once)

Note: Claude Code's own /stats command had a related issue, fixed in v2.1.89 ("Fixed /stats undercounting tokens by excluding subagent usage").

The fix is small — pass the seenKeys set through ClaudeScanState so it persists across files within a scan pass.

_>> https://github.com/steipete/CodexBar/pull/647/changes
PR: 647

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions