-
Notifications
You must be signed in to change notification settings - Fork 14
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
When using streaming mode, the ChatLiteLLM class doesn't provide usage metadata (token counts), making it impossible to track costs during streaming operations. Non-streaming mode works correctly and includes usage metadata.
Steps to Reproduce
from langchain_litellm import ChatLiteLLM
from langchain_core.messages import HumanMessage
# Test streaming (the broken case)
llm = ChatLiteLLM(model="gpt-4o", streaming=True)
message = [HumanMessage(content="Say hello")]
# Collect all chunks and check for usage metadata
chunks = list(llm.stream(message))
usage_found = any(hasattr(chunk, 'usage_metadata') and chunk.usage_metadata for chunk in chunks)
print(f"Streaming usage metadata found: {usage_found}") # Returns False
# Test non-streaming (the working case)
llm.streaming = False
result = llm.invoke(message)
print(f"Non-streaming usage metadata: {result.usage_metadata is not None}") # Returns TrueExpected Behavior
Both streaming and non-streaming should provide usage metadata:
Streaming usage metadata found: True
Non-streaming usage metadata: True
Actual Behavior
Streaming usage metadata found: False
Non-streaming usage metadata: True
Environment
- langchain-litellm: 0.2.1 (latest)
- Python: 3.11+
- Provider: OpenAI (also affects Anthropic and others)
- OS: Any
Impact
- Impossible to track streaming costs in real-time
- Inconsistent API behavior between streaming/non-streaming modes
- Prevents usage-based billing and monitoring in production applications
Root Cause
- Missing
stream_options={"include_usage": True}parameter - Usage data not being extracted from streaming chunks
- Usage metadata not attached to message chunks
This affects all streaming operations across different LLM providers and is critical for production cost monitoring.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working