Claude cost/token totals ~3x too high due to per-file deduplication missing subagent duplicates

`parseClaudeFile()` tracks seen `messageId:requestId` pairs in a local `seenKeys` set that resets for every file. This works fine for deduplicating streaming chunks *within* a single JSONL, but it doesn't catch duplicates *across* files.

Claude Code writes subagent messages to both the parent session JSONL and a separate file under `<session>/subagents/`. The entries share the same `messageId:requestId` and identical usage values — they're the same API call logged twice.

Because `seenKeys` is per-file, both copies get counted. On my machine (30 days, 538 JSONL files):

```
per-file dedup:  7.98B tokens  (current)
global dedup:    2.59B tokens  (correct — matches ccusage output)
```

That's a 3.08x overcounting. The cost is inflated proportionally.

Quick breakdown of the duplicate structure:

```
entries only in parent files:    4,879
entries only in subagent files:  4,833  (unique, must be counted)
entries in both:                 5,127  (duplicates, should be counted once)
```

Note: Claude Code's own `/stats` command had a related issue, fixed in v2.1.89 ("Fixed /stats undercounting tokens by excluding subagent usage").

The fix is small — pass the `seenKeys` set through `ClaudeScanState` so it persists across files within a scan pass. 

_>> https://github.com/steipete/CodexBar/pull/647/changes
PR: [647](https://github.com/steipete/CodexBar/pull/647)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude cost/token totals ~3x too high due to per-file deduplication missing subagent duplicates #646

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Claude cost/token totals ~3x too high due to per-file deduplication missing subagent duplicates #646

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions