[Bug]: Gemini 2.5 Flash multi-turn function calling fails with "Please ensure that the number of function response parts is equal to the number of function call parts"

### What happened?

Environment:
  - LiteLLM Version: v1.80.5-stable
  - Model: gemini/gemini-2.5-flash
  - Provider: Google Gemini API
  - OpenAI SDK compatibility mode

  Description:

  When using Gemini 2.5 Flash with multi-turn function calling (2+ rounds of tool calls), LiteLLM's conversion from OpenAI format to Gemini native format causes validation errors:

  **Please ensure that the number of function response parts is equal to the number of function call parts**

  Suspected root cause:

  Gemini 2.5 Flash returns thought signatures in assistant responses, which LiteLLM properly captures in:
  1. Tool call IDs as call_xxx__thought__<signature>
  2. provider_specific_fields.thought_signature in tool_calls

  However, when these messages are sent back in subsequent turns, LiteLLM re-extracts thought signatures from the embedded tool_call IDs (via _get_thought_signature_from_tool()) and
  adds them as separate parts in the Gemini native format:

  # From LiteLLM source: litellm/llms/gemini/chat/transformation.py
  parts.append({"thoughtSignature": thought_signature})
  parts.append({"functionCall": {...}})

  This creates 2 parts for each tool call in assistant messages, but only 1 part for each tool response, causing the validation error.

  Steps to Reproduce:

  1. Use Gemini 2.5 Flash with function calling enabled
  2. Make an initial tool call (works fine)
  3. Return tool response
  4. LLM makes another tool call in the same conversation
  5. Error occurs: validation fails due to mismatched parts count

  Expected Behavior:

  Multi-turn function calling not causing validation error from Gemini when payload passed to litellm is valid.

  Actual Behavior:

  Assistant messages get 2 parts per tool call (thought signature + function call), but tool response messages get 1 part (function response only), causing validation errors.

  Relevant Code:

  The issue stems from litellm/llms/gemini/chat/transformation.py:
  - _get_thought_signature_from_tool() extracts signatures from tool_call IDs
  - These signatures are added as separate parts even in historical messages

  Temporary workaround:

  Disable thinking mode for Gemini requests so no thought signature is present:
  if (model.startsWith("gemini")) {
    request = {
      ...request,
      thinking: { type: "disabled", budget_tokens: 0 },
    };
  }


### Relevant log output

```shell

```

### Are you a ML Ops Team?

No

### What LiteLLM version are you on ?

v1.80.5-stable

### Twitter / LinkedIn details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Gemini 2.5 Flash multi-turn function calling fails with "Please ensure that the number of function response parts is equal to the number of function call parts" #17949

What happened?

From LiteLLM source: litellm/llms/gemini/chat/transformation.py

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Gemini 2.5 Flash multi-turn function calling fails with "Please ensure that the number of function response parts is equal to the number of function call parts" #17949

Description

What happened?

From LiteLLM source: litellm/llms/gemini/chat/transformation.py

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions