Documented Claude Code hook script reports estimated token counts instead of real API usage

### Description

The documented hook script in the [Claude Code hook integration](https://langfuse.com/integrations/other/claude-code)  sends generation observations to Langfuse without passing `usage_details`, causing Langfuse to fall back to its internal tokenizer to estimate token counts rather than reporting the real values from the Anthropic API.

---

### Root cause

Claude Code writes full API response metadata into the transcript `.jsonl` file at `~/.claude/projects/<session>/`, including exact token counts per turn. Each assistant message entry in that file contains a `usage` block:

```json
"usage": {
  "input_tokens": 3,
  "cache_creation_input_tokens": 1809,
  "cache_read_input_tokens": 14885,
  "output_tokens": 165
}
```

The hook script reads this `.jsonl` file line by line and parses each message object. For each assistant message it extracts `message.role`, `message.content`, `message.model`, and `message.id`, but never reads `message.usage`. That field is silently skipped on every iteration, so the real token counts never reach Langfuse.

With no `usage_details` passed to the generation observation, Langfuse falls back to inference. Per the [token & cost tracking documentation](https://langfuse.com/docs/observability/features/token-and-cost-tracking#infer):

> *"If either usage or cost are not ingested, Langfuse will attempt to infer the missing values based on the `model` parameter of the generation at the time of ingestion."*

Notably, the same documentation warns in the [tokenizer table](https://langfuse.com/docs/observability/features/token-and-cost-tracking#usage):

> *"According to Anthropic, their tokenizer is not accurate for Claude 3 models. If possible, send us the tokens from their API response."*

---

### Impact

The estimated total is drastically understated. `cache_read_input_tokens` and `cache_creation_input_tokens` are completely invisible. Users have essentially no visibility into their actual token consumption.

---

### Screenshots

Same prompt, same project, same model (`claude-sonnet-4-6`). Cache distribution differs slightly between sessions due to Claude's internal caching state, this is expected behavior and does not affect the validity of the comparison.

**Without fix**: Langfuse falls back to tokenizer estimation. Cache tokens are invisible and the total is drastically understated:

<img width="255" height="240" alt="Image" src="https://github.com/user-attachments/assets/592d5a61-5b76-4787-b81e-ee27b80fb0c0" />

**With fix**: Real API usage data read from the transcript. Cache breakdown is fully visible and the total reflects actual consumption:

<img width="258" height="290" alt="Image" src="https://github.com/user-attachments/assets/b6e1c8ff-5cc2-4a6b-91be-d69b6786757f" />

---

### Proposed fix

Add a `get_usage()` helper that reads the `usage` field from the transcript's assistant messages and pass the result as `usage_details` to the generation observation. This follows the pattern already documented in the [Langfuse Python SDK docs for Anthropic](https://langfuse.com/docs/observability/features/token-and-cost-tracking#ingest):

```python
def get_usage(assistant_msgs: List[Dict[str, Any]]) -> Dict[str, int]:
    input_tokens = 0
    output_tokens = 0
    cache_read = 0
    cache_creation = 0
    for msg in assistant_msgs:
        m = msg.get("message", {})
        usage = m.get("usage", {})
        input_tokens += usage.get("input_tokens", 0)
        output_tokens += usage.get("output_tokens", 0)
        cache_read += usage.get("cache_read_input_tokens", 0)
        cache_creation += usage.get("cache_creation_input_tokens", 0)
    return {
        "input": input_tokens,
        "output": output_tokens,
        "cache_read_input_tokens": cache_read,
        "cache_creation_input_tokens": cache_creation,
    }
```

Then in `emit_turn`:

```python
with langfuse.start_as_current_observation(
    name="Claude Response",
    as_type="generation",
    model=model,
    input={"role": "user", "content": user_text},
    output={"role": "assistant", "content": assistant_text},
    usage_details=get_usage(turn.assistant_msgs), 
    metadata={...},
):
    pass
```

---

### Environment

- Claude Code version: 2.1.83
- Langfuse Python SDK: 4.0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documented Claude Code hook script reports estimated token counts instead of real API usage #2718

Description

Root cause

Impact

Screenshots

Proposed fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Documented Claude Code hook script reports estimated token counts instead of real API usage #2718

Description

Description

Root cause

Impact

Screenshots

Proposed fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions