Skip to content

Documented Claude Code hook script reports estimated token counts instead of real API usage #2718

@dfrojas

Description

@dfrojas

Description

The documented hook script in the Claude Code hook integration sends generation observations to Langfuse without passing usage_details, causing Langfuse to fall back to its internal tokenizer to estimate token counts rather than reporting the real values from the Anthropic API.


Root cause

Claude Code writes full API response metadata into the transcript .jsonl file at ~/.claude/projects/<session>/, including exact token counts per turn. Each assistant message entry in that file contains a usage block:

"usage": {
  "input_tokens": 3,
  "cache_creation_input_tokens": 1809,
  "cache_read_input_tokens": 14885,
  "output_tokens": 165
}

The hook script reads this .jsonl file line by line and parses each message object. For each assistant message it extracts message.role, message.content, message.model, and message.id, but never reads message.usage. That field is silently skipped on every iteration, so the real token counts never reach Langfuse.

With no usage_details passed to the generation observation, Langfuse falls back to inference. Per the token & cost tracking documentation:

"If either usage or cost are not ingested, Langfuse will attempt to infer the missing values based on the model parameter of the generation at the time of ingestion."

Notably, the same documentation warns in the tokenizer table:

"According to Anthropic, their tokenizer is not accurate for Claude 3 models. If possible, send us the tokens from their API response."


Impact

The estimated total is drastically understated. cache_read_input_tokens and cache_creation_input_tokens are completely invisible. Users have essentially no visibility into their actual token consumption.


Screenshots

Same prompt, same project, same model (claude-sonnet-4-6). Cache distribution differs slightly between sessions due to Claude's internal caching state, this is expected behavior and does not affect the validity of the comparison.

Without fix: Langfuse falls back to tokenizer estimation. Cache tokens are invisible and the total is drastically understated:

Image

With fix: Real API usage data read from the transcript. Cache breakdown is fully visible and the total reflects actual consumption:

Image

Proposed fix

Add a get_usage() helper that reads the usage field from the transcript's assistant messages and pass the result as usage_details to the generation observation. This follows the pattern already documented in the Langfuse Python SDK docs for Anthropic:

def get_usage(assistant_msgs: List[Dict[str, Any]]) -> Dict[str, int]:
    input_tokens = 0
    output_tokens = 0
    cache_read = 0
    cache_creation = 0
    for msg in assistant_msgs:
        m = msg.get("message", {})
        usage = m.get("usage", {})
        input_tokens += usage.get("input_tokens", 0)
        output_tokens += usage.get("output_tokens", 0)
        cache_read += usage.get("cache_read_input_tokens", 0)
        cache_creation += usage.get("cache_creation_input_tokens", 0)
    return {
        "input": input_tokens,
        "output": output_tokens,
        "cache_read_input_tokens": cache_read,
        "cache_creation_input_tokens": cache_creation,
    }

Then in emit_turn:

with langfuse.start_as_current_observation(
    name="Claude Response",
    as_type="generation",
    model=model,
    input={"role": "user", "content": user_text},
    output={"role": "assistant", "content": assistant_text},
    usage_details=get_usage(turn.assistant_msgs), 
    metadata={...},
):
    pass

Environment

  • Claude Code version: 2.1.83
  • Langfuse Python SDK: 4.0.1

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions