Skip to content

[Bug]: Openrouter streaming Doesn't Return 'cost' and 'is_byok' from openrouter #11626

@daarko10

Description

@daarko10

What happened?

When querying the openrouter api directly with "usage":{"include":True} we see the usage come back like this:

Usage object: {'prompt_tokens': 28, 'completion_tokens': 38, 'total_tokens': 66, 'cost': 7.1e-05, 'is_byok': False, 'prompt_tokens_details': {'cached_tokens': 0}, 'cost_details': {'upstream_inference_cost': None}, 'completion_tokens_details': {'reasoning_tokens': 0}}

yet when we are querying openrouter using stream with stream_options={"include_usage":True} this is the usage object we get back:
Usage information found: Raw usage object: Usage(completion_tokens=34, prompt_tokens=13, total_tokens=47, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

script to replicate on latest main state:


import os
import sys
import litellm

# Set your OpenRouter API key
os.environ["OPENROUTER_API_KEY"] = "insert_key_here"

# Define a simple message
messages = [{"role": "user", "content": "Hello, how are you?"}]

# Enable verbose logging
os.environ["LITELLM_VERBOSE"] = "1"

def test_streaming_with_final_check():
    print("\\n\\nTesting model: openai/gpt-3.5-turbo (stream=True with final check)")
    print("-" * 50)

    response_stream = litellm.completion(
        model="openrouter/openai/gpt-3.5-turbo",
        messages=messages,
        stream=True,
        stream_options={"include_usage": True}
    )

    # Iterate through the stream and collect chunks
    chunks = []
    final_usage = None
    for chunk in response_stream:
        chunks.append(chunk)
        # The final usage object is expected in the last chunk
        if hasattr(chunk, "usage") and chunk.usage is not None:
            final_usage = chunk.usage

    print("Streaming finished.")

    # Manually build the final response from chunks
    # This simulates what happens internally after the stream ends.
    # The key is that the *last* chunk should contain the complete usage.
    # Let's check the final_usage object we captured.

    print("\\nChecking for usage information in the final captured usage object:")
    if final_usage:
        print("Usage information found:")
        print(f"Raw usage object: {final_usage}")
        print(f"Prompt tokens: {final_usage.prompt_tokens}")
        print(f"Completion tokens: {final_usage.completion_tokens}")
        print(f"Total tokens: {final_usage.total_tokens}")

        # Check for cost information
        if hasattr(final_usage, "cost") and final_usage.cost is not None:
            print(f"Cost: {final_usage.cost}")
            assert final_usage.cost > 0
        else:
            print("No cost information found in the final usage object.")
            assert False, "Cost not found"

        # Check for is_byok
        if hasattr(final_usage, "is_byok") and final_usage.is_byok is not None:
            print(f"Is BYOK: {final_usage.is_byok}")
        else:
            print("No is_byok information found in the final usage object.")
            assert False, "is_byok not found"
    else:
        print("No usage information found in any chunk.")
        assert False, "Usage object not found"

try:
    test_streaming_with_final_check()
    print("\\nAll tests passed!")
except Exception as e:
    print(f"\\nTest failed: {e}")
    sys.exit(1)

Relevant log output

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.72.3

Twitter / LinkedIn details

https://www.linkedin.com/in/dagokogos/

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions