-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
What happened?
When querying the openrouter api directly with "usage":{"include":True} we see the usage come back like this:
Usage object: {'prompt_tokens': 28, 'completion_tokens': 38, 'total_tokens': 66, 'cost': 7.1e-05, 'is_byok': False, 'prompt_tokens_details': {'cached_tokens': 0}, 'cost_details': {'upstream_inference_cost': None}, 'completion_tokens_details': {'reasoning_tokens': 0}}
yet when we are querying openrouter using stream with stream_options={"include_usage":True} this is the usage object we get back:
Usage information found: Raw usage object: Usage(completion_tokens=34, prompt_tokens=13, total_tokens=47, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)
script to replicate on latest main state:
import os
import sys
import litellm
# Set your OpenRouter API key
os.environ["OPENROUTER_API_KEY"] = "insert_key_here"
# Define a simple message
messages = [{"role": "user", "content": "Hello, how are you?"}]
# Enable verbose logging
os.environ["LITELLM_VERBOSE"] = "1"
def test_streaming_with_final_check():
print("\\n\\nTesting model: openai/gpt-3.5-turbo (stream=True with final check)")
print("-" * 50)
response_stream = litellm.completion(
model="openrouter/openai/gpt-3.5-turbo",
messages=messages,
stream=True,
stream_options={"include_usage": True}
)
# Iterate through the stream and collect chunks
chunks = []
final_usage = None
for chunk in response_stream:
chunks.append(chunk)
# The final usage object is expected in the last chunk
if hasattr(chunk, "usage") and chunk.usage is not None:
final_usage = chunk.usage
print("Streaming finished.")
# Manually build the final response from chunks
# This simulates what happens internally after the stream ends.
# The key is that the *last* chunk should contain the complete usage.
# Let's check the final_usage object we captured.
print("\\nChecking for usage information in the final captured usage object:")
if final_usage:
print("Usage information found:")
print(f"Raw usage object: {final_usage}")
print(f"Prompt tokens: {final_usage.prompt_tokens}")
print(f"Completion tokens: {final_usage.completion_tokens}")
print(f"Total tokens: {final_usage.total_tokens}")
# Check for cost information
if hasattr(final_usage, "cost") and final_usage.cost is not None:
print(f"Cost: {final_usage.cost}")
assert final_usage.cost > 0
else:
print("No cost information found in the final usage object.")
assert False, "Cost not found"
# Check for is_byok
if hasattr(final_usage, "is_byok") and final_usage.is_byok is not None:
print(f"Is BYOK: {final_usage.is_byok}")
else:
print("No is_byok information found in the final usage object.")
assert False, "is_byok not found"
else:
print("No usage information found in any chunk.")
assert False, "Usage object not found"
try:
test_streaming_with_final_check()
print("\\nAll tests passed!")
except Exception as e:
print(f"\\nTest failed: {e}")
sys.exit(1)
Relevant log output
Are you a ML Ops Team?
No
What LiteLLM version are you on ?
v1.72.3