🐛 Fix: Critical Streaming Usage Metadata & Advanced Model Support #22

imranarshad · 2025-07-07T17:37:58Z

🐛 Fix: Critical Streaming Usage Metadata & Advanced Model Support

📋 Summary

This PR fixes 6 critical bugs in the langchain-litellm package that affect streaming, usage tracking, and advanced AI model features. The primary fix addresses Issue #20 - missing usage metadata in streaming responses.

🔥 Critical Bugs Fixed

1. Missing Usage Metadata in Streaming Responses (Issue #20)

Problem: Streaming responses don't include token usage metadata
Impact: Impossible to track costs during streaming operations
Fix: Added stream_options={"include_usage": True} and usage extraction logic

2. Missing reasoning_content Support for Thinking Models

Problem: reasoning_content lost for o1, Claude thinking, Gemini reasoning models
Impact: Loss of valuable reasoning insights users pay premium for
Fix: Added reasoning_content handling in _convert_dict_to_message

3. Streaming Crashes with Dictionary Deltas

Problem: AttributeError when LiteLLM returns dict deltas instead of Delta objects
Impact: Unexpected crashes during streaming
Fix: Added robust type checking and handling for both formats

4. Tool Call Processing Failures

Problem: KeyError when providers return different tool call formats
Impact: Tool calling failures with certain providers
Fix: Added defensive programming with fallbacks

5. Incomplete Usage Metadata

Problem: Missing cache tokens and reasoning tokens in usage details
Impact: Incomplete cost tracking and debugging info
Fix: Enhanced _create_usage_metadata to extract advanced details

6. Async Streaming Reliability

Problem: Incorrect async completion call pattern
Impact: Potential async streaming failures
Fix: Corrected async streaming method call

🧪 Testing

Before Fix (Broken):

from langchain_litellm import ChatLiteLLM
from langchain_core.messages import HumanMessage

llm = ChatLiteLLM(model="gpt-4o", streaming=True)
chunks = list(llm.stream([HumanMessage(content="Hello")]))
usage_found = any(hasattr(chunk, 'usage_metadata') and chunk.usage_metadata for chunk in chunks)
print(f"Streaming usage metadata found: {usage_found}")  # False ❌

After Fix (Working):

from langchain_litellm import ChatLiteLLM
from langchain_core.messages import HumanMessage

llm = ChatLiteLLM(model="gpt-4o", streaming=True)
chunks = list(llm.stream([HumanMessage(content="Hello")]))
usage_found = any(hasattr(chunk, 'usage_metadata') and chunk.usage_metadata for chunk in chunks)
print(f"Streaming usage metadata found: {usage_found}")  # True ✅

# Get usage details
final_chunk = chunks[-1]
print(f"Input tokens: {final_chunk.usage_metadata.input_tokens}")    # 12
print(f"Output tokens: {final_chunk.usage_metadata.output_tokens}")  # 5
print(f"Total tokens: {final_chunk.usage_metadata.total_tokens}")    # 17

Advanced Model Testing:

# Testing reasoning_content support
llm = ChatLiteLLM(model="vertex_ai/gemini-2.5-flash")
result = llm.invoke("What is 2+2?", thinking={"type": "enabled", "budget_tokens": 1024})
print(f"Reasoning content available: {bool(result.additional_kwargs.get('reasoning_content'))}")  # True ✅
print(f"Reasoning tokens: {result.usage_metadata.output_token_details.get('reasoning', 0)}")  # 457

📊 Impact Assessment

Feature	Before	After	Status
Streaming Usage Metadata	❌ Missing	✅ Working	Fixed
Reasoning Content (o1, Claude, Gemini)	❌ Lost	✅ Preserved	Fixed
Streaming Stability	❌ Crashes	✅ Robust	Fixed
Tool Call Compatibility	❌ Failures	✅ Reliable	Fixed
Advanced Usage Details	❌ Basic	✅ Complete	Enhanced
Async Streaming	❌ Unreliable	✅ Stable	Fixed

🔧 Technical Details

Key Changes in `langchain_litellm/chat_models/litellm.py`:

Enhanced _default_params:

# Add stream_options when streaming is enabled
if self.streaming:
    params["stream_options"] = {"include_usage": True}

Fixed _stream method:

# Extract and attach usage metadata from chunks
if "usage" in chunk and chunk["usage"]:
    usage_metadata = _create_usage_metadata(chunk["usage"])
    message_chunk.usage_metadata = usage_metadata

Added reasoning_content support:

# Add reasoning_content support for thinking-enabled models
if _dict.get("reasoning_content"):
    additional_kwargs["reasoning_content"] = _dict["reasoning_content"]

Robust delta handling:

# Handle both Delta objects and dicts
if isinstance(delta, dict):
    role = delta.get("role")
    content = delta.get("content") or ""
    # ... handle dict format
else:
    role = delta.role
    content = delta.content or ""
    # ... handle Delta object format

Enhanced usage metadata:

# Extract advanced usage details
if "cache_read_input_tokens" in token_usage:
    input_token_details["cache_read"] = token_usage["cache_read_input_tokens"]

# Reasoning tokens for o1 models, Claude thinking, etc.
completion_tokens_details = token_usage.get("completion_tokens_details", {})
if completion_tokens_details and "reasoning_tokens" in completion_tokens_details:
    output_token_details["reasoning"] = completion_tokens_details["reasoning_tokens"]

🚀 Benefits

✅ Production Ready: Real-time cost tracking for streaming applications
✅ Advanced AI Support: Full support for o1, Claude thinking, Gemini reasoning
✅ Provider Compatibility: Robust handling across OpenAI, Anthropic, Google, etc.
✅ Backward Compatible: No breaking changes to existing code
✅ Comprehensive: Fixes multiple related issues in one PR

🔗 Related Issues

Fixes Bug: Streaming responses missing usage metadata for cost tracking #20: Streaming responses missing usage metadata
Addresses reasoning_content support for thinking models
Improves streaming reliability and provider compatibility
Enhances usage tracking for cost optimization

📦 Commits

5201401: feat: Add comprehensive streaming usage metadata support
af41ab9: feat: add reasoning_content support for thinking-enabled models

Ready for Review 🎉 This PR addresses fundamental issues that affect production usage tracking and advanced AI model features. All fixes maintain backward compatibility while adding essential missing functionality.

Akshay-Dongare · 2025-07-11T11:05:35Z

BUGS_FIXED.md

Please remove this file

Akshay-Dongare · 2025-07-11T11:05:52Z

GITHUB_ISSUES_TEMPLATES.md

Please remove this file

Akshay-Dongare · 2025-07-11T11:06:32Z

ISSUE_20_UPDATE.md

Please remove this file

Akshay-Dongare · 2025-07-11T11:06:39Z

PR_DESCRIPTION.md

Please remove this file

Akshay-Dongare · 2025-07-11T11:07:23Z

test_streaming_bug.py

Please remove this file

@Akshay-Dongare

- Remove BUGS_FIXED.md - Remove GITHUB_ISSUES_TEMPLATES.md - Remove ISSUE_20_UPDATE.md - Remove PR_DESCRIPTION.md - Remove test_streaming_bug.py Addresses feedback from @Akshay-Dongare in PR Akshay-Dongare#22

- Fix streaming delta conversion to handle both Delta objects and dicts - Add stream_options for usage tracking in streaming responses - Extract and attach usage metadata to streaming message chunks - Support advanced usage fields (cache tokens, reasoning tokens) - Add comprehensive unit and integration tests - Maintain 100% backward compatibility Fixes streaming usage metadata issues and enables cost optimization features. Addresses core functionality gaps in streaming token usage tracking.

- Add cache_creation support via cache_creation_input_tokens - Add audio token support for both input and output - Ensures complete compliance with OpenAI usage metadata schema - Supports multimodal models with audio token tracking

- Fix existing tests to use dictionary access (UsageMetadata is TypedDict) - Add comprehensive tests for new cache and audio token fields - Test cache_creation_input_tokens support - Test audio_input_tokens and audio_output_tokens support - Add complete schema test matching OpenAI format - All usage metadata tests now pass (7/7) Validates the complete token usage metadata schema: - input_tokens, output_tokens, total_tokens - input_token_details: cache_read, cache_creation, audio - output_token_details: audio, reasoning

This test demonstrates that LangChain's _normalize_messages function now correctly preserves LiteLLM's official multimodal format instead of incorrectly transforming it. The test verifies that: - LiteLLM format: {'type': 'file', 'file': {'file_data': '...'}} is preserved - OpenAI format: {'type': 'image_url', 'image_url': {'url': '...'}} works correctly - Vertex format: with 'format' key is preserved Fixes the KeyError: 'file' that occurred when LangChain transformed LiteLLM's format to OpenAI's format, breaking LiteLLM compatibility.

@Akshay-Dongare

- Remove BUGS_FIXED.md - Remove GITHUB_ISSUES_TEMPLATES.md - Remove ISSUE_20_UPDATE.md - Remove PR_DESCRIPTION.md - Remove test_streaming_bug.py Addresses feedback from @Akshay-Dongare in PR Akshay-Dongare#22

imranarshad mentioned this pull request Jul 7, 2025

Bug: Streaming responses missing usage metadata for cost tracking #20

Open

Akshay-Dongare self-requested a review July 11, 2025 10:54

Akshay-Dongare self-assigned this Jul 11, 2025

Akshay-Dongare requested changes Jul 11, 2025

View reviewed changes

imranarshad added 5 commits September 22, 2025 06:45

Remove documentation files as requested in PR review

3d6d675

- Remove BUGS_FIXED.md - Remove GITHUB_ISSUES_TEMPLATES.md - Remove ISSUE_20_UPDATE.md - Remove PR_DESCRIPTION.md - Remove test_streaming_bug.py Addresses feedback from @Akshay-Dongare in PR Akshay-Dongare#22

imranarshad force-pushed the feat/streaming-usage-metadata branch from c0de4c6 to 3d6d675 Compare September 22, 2025 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 Fix: Critical Streaming Usage Metadata & Advanced Model Support #22

🐛 Fix: Critical Streaming Usage Metadata & Advanced Model Support #22

Uh oh!

imranarshad commented Jul 7, 2025

Uh oh!

Akshay-Dongare Jul 11, 2025

Uh oh!

Akshay-Dongare Jul 11, 2025

Uh oh!

Akshay-Dongare Jul 11, 2025

Uh oh!

Akshay-Dongare Jul 11, 2025

Uh oh!

Akshay-Dongare Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🐛 Fix: Critical Streaming Usage Metadata & Advanced Model Support #22

Are you sure you want to change the base?

🐛 Fix: Critical Streaming Usage Metadata & Advanced Model Support #22

Uh oh!

Conversation

imranarshad commented Jul 7, 2025

🐛 Fix: Critical Streaming Usage Metadata & Advanced Model Support

📋 Summary

🔥 Critical Bugs Fixed

1. Missing Usage Metadata in Streaming Responses (Issue #20)

2. Missing reasoning_content Support for Thinking Models

3. Streaming Crashes with Dictionary Deltas

4. Tool Call Processing Failures

5. Incomplete Usage Metadata

6. Async Streaming Reliability

🧪 Testing

Before Fix (Broken):

After Fix (Working):

Advanced Model Testing:

📊 Impact Assessment

🔧 Technical Details

Key Changes in langchain_litellm/chat_models/litellm.py:

🚀 Benefits

🔗 Related Issues

📦 Commits

Uh oh!

Akshay-Dongare Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Akshay-Dongare Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Akshay-Dongare Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Akshay-Dongare Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Akshay-Dongare Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Key Changes in `langchain_litellm/chat_models/litellm.py`: