-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
What happened?
Environment:
- LiteLLM Version: v1.80.5-stable
- Model: gemini/gemini-2.5-flash
- Provider: Google Gemini API
- OpenAI SDK compatibility mode
Description:
When using Gemini 2.5 Flash with multi-turn function calling (2+ rounds of tool calls), LiteLLM's conversion from OpenAI format to Gemini native format causes validation errors:
Please ensure that the number of function response parts is equal to the number of function call parts
Suspected root cause:
Gemini 2.5 Flash returns thought signatures in assistant responses, which LiteLLM properly captures in:
- Tool call IDs as call_xxx__thought__
- provider_specific_fields.thought_signature in tool_calls
However, when these messages are sent back in subsequent turns, LiteLLM re-extracts thought signatures from the embedded tool_call IDs (via _get_thought_signature_from_tool()) and
adds them as separate parts in the Gemini native format:
From LiteLLM source: litellm/llms/gemini/chat/transformation.py
parts.append({"thoughtSignature": thought_signature})
parts.append({"functionCall": {...}})
This creates 2 parts for each tool call in assistant messages, but only 1 part for each tool response, causing the validation error.
Steps to Reproduce:
- Use Gemini 2.5 Flash with function calling enabled
- Make an initial tool call (works fine)
- Return tool response
- LLM makes another tool call in the same conversation
- Error occurs: validation fails due to mismatched parts count
Expected Behavior:
Multi-turn function calling not causing validation error from Gemini when payload passed to litellm is valid.
Actual Behavior:
Assistant messages get 2 parts per tool call (thought signature + function call), but tool response messages get 1 part (function response only), causing validation errors.
Relevant Code:
The issue stems from litellm/llms/gemini/chat/transformation.py:
- _get_thought_signature_from_tool() extracts signatures from tool_call IDs
- These signatures are added as separate parts even in historical messages
Temporary workaround:
Disable thinking mode for Gemini requests so no thought signature is present:
if (model.startsWith("gemini")) {
request = {
...request,
thinking: { type: "disabled", budget_tokens: 0 },
};
}
Relevant log output
Are you a ML Ops Team?
No
What LiteLLM version are you on ?
v1.80.5-stable
Twitter / LinkedIn details
No response