-
-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
What happened?
When sending tool output of type input_image to an OpenAI Responses-API reasoning model, the model can correctly interpret the image when using litellm.responses().
However, when calling the same Responses API using:
litellm.completion(model="openai/responses/gpt-4.1", messages=input_list)
…the model fails to parse the image and instead produces unrelated hallucinated text.
This means litellm.completion() is incorrectly serializing or transforming the function_call_output format.
It drops or corrupts the structured image content.
Repro steps
Minimal python script that can be run to reproduce the bug:
import litellm
import requests,base64
def fetch_image(image_url:str) -> str:
"""
Fetch an image from a URL and convert it to base64 encoding.
"""
response = requests.get(image_url)
response.raise_for_status() # Raise an exception for bad status codes
# Convert image to base64
image_base64 = base64.b64encode(response.content).decode('utf-8')
return f"data:image/png;base64,{image_base64}"
base64_str = fetch_image("https://awsmp-logos.s3.amazonaws.com/seller-xw5kijmvmzasy/c233c9ade2ccb5491072ae232c814942.png")
input_list = [{'role': 'user',
'content': "Fetch the image in the url shown bellow. Don't answer me with the base64 code, just fetch the image: https://awsmp-logos.s3.amazonaws.com/seller-xw5kijmvmzasy/c233c9ade2ccb5491072ae232c814942.png"},
{'arguments': '{"image_url":"https://awsmp-logos.s3.amazonaws.com/seller-xw5kijmvmzasy/c233c9ade2ccb5491072ae232c814942.png"}',
'call_id': 'call_njInAzirflqokBVScx2PcR5J',
'name': 'fetch_image',
'type': 'function_call',
'id': 'fc_06524f14427ece7200693953e7ce748193bf74d0bc34676e10',
'status': 'completed'},
{'type': 'function_call_output',
'call_id': 'call_njInAzirflqokBVScx2PcR5J',
'output': [{'type': 'input_image','image_url': base64_str}]
},
{'role': 'user', 'content': 'What is written in the image?'}
]
# response = litellm.responses(
# model="gpt-4.1",
# input=input_list
# )
# response.dict()
response = litellm.completion(
model="openai/responses/gpt-4.1", # tells litellm to call the model via the Responses API
messages=input_list,
)
response.dict()
Why this matters
Many users rely on LiteLLM as a drop-in router, including for Agents SDK usage where the LiteLLM integration always use completion() method.
If completion() corrupts structured tool outputs (especially images), agents will misbehave or hallucinate.
Related OpenAI Agents SDK issue: openai/openai-agents-python#2163
Thank you for the help!
Relevant log output
# Log with completion()
'choices': [{'finish_reason': 'stop',
'index': 0,
'message': {'content': 'The image displays the word **"IICS"**, which stands for **Informatica Intelligent Cloud Services**. The logo is associated with Informatica, a company known for its data integration products.',
'role': 'assistant',
'tool_calls': None,
'function_call': None,
'provider_specific_fields': None}}],
# Log with responses()
'output': [{'id': 'msg_06524f14427ece72006939685b8b248193ba2b5f203ef20364',
'content': [{'annotations': [],
'text': 'The text in the image reads: **"LiteLLM"**.',
'type': 'output_text',
'logprobs': []}],
'role': 'assistant',
'status': 'completed',
'type': 'message'}],Are you a ML Ops Team?
No
What LiteLLM version are you on ?
v1.77.3
Twitter / LinkedIn details
No response