-
Notifications
You must be signed in to change notification settings - Fork 203
Documented Claude Code hook script reports estimated token counts instead of real API usage #2718
Description
Description
The documented hook script in the Claude Code hook integration sends generation observations to Langfuse without passing usage_details, causing Langfuse to fall back to its internal tokenizer to estimate token counts rather than reporting the real values from the Anthropic API.
Root cause
Claude Code writes full API response metadata into the transcript .jsonl file at ~/.claude/projects/<session>/, including exact token counts per turn. Each assistant message entry in that file contains a usage block:
"usage": {
"input_tokens": 3,
"cache_creation_input_tokens": 1809,
"cache_read_input_tokens": 14885,
"output_tokens": 165
}The hook script reads this .jsonl file line by line and parses each message object. For each assistant message it extracts message.role, message.content, message.model, and message.id, but never reads message.usage. That field is silently skipped on every iteration, so the real token counts never reach Langfuse.
With no usage_details passed to the generation observation, Langfuse falls back to inference. Per the token & cost tracking documentation:
"If either usage or cost are not ingested, Langfuse will attempt to infer the missing values based on the
modelparameter of the generation at the time of ingestion."
Notably, the same documentation warns in the tokenizer table:
"According to Anthropic, their tokenizer is not accurate for Claude 3 models. If possible, send us the tokens from their API response."
Impact
The estimated total is drastically understated. cache_read_input_tokens and cache_creation_input_tokens are completely invisible. Users have essentially no visibility into their actual token consumption.
Screenshots
Same prompt, same project, same model (claude-sonnet-4-6). Cache distribution differs slightly between sessions due to Claude's internal caching state, this is expected behavior and does not affect the validity of the comparison.
Without fix: Langfuse falls back to tokenizer estimation. Cache tokens are invisible and the total is drastically understated:
With fix: Real API usage data read from the transcript. Cache breakdown is fully visible and the total reflects actual consumption:
Proposed fix
Add a get_usage() helper that reads the usage field from the transcript's assistant messages and pass the result as usage_details to the generation observation. This follows the pattern already documented in the Langfuse Python SDK docs for Anthropic:
def get_usage(assistant_msgs: List[Dict[str, Any]]) -> Dict[str, int]:
input_tokens = 0
output_tokens = 0
cache_read = 0
cache_creation = 0
for msg in assistant_msgs:
m = msg.get("message", {})
usage = m.get("usage", {})
input_tokens += usage.get("input_tokens", 0)
output_tokens += usage.get("output_tokens", 0)
cache_read += usage.get("cache_read_input_tokens", 0)
cache_creation += usage.get("cache_creation_input_tokens", 0)
return {
"input": input_tokens,
"output": output_tokens,
"cache_read_input_tokens": cache_read,
"cache_creation_input_tokens": cache_creation,
}Then in emit_turn:
with langfuse.start_as_current_observation(
name="Claude Response",
as_type="generation",
model=model,
input={"role": "user", "content": user_text},
output={"role": "assistant", "content": assistant_text},
usage_details=get_usage(turn.assistant_msgs),
metadata={...},
):
passEnvironment
- Claude Code version: 2.1.83
- Langfuse Python SDK: 4.0.1