Skip to content

fix: pass reasoning_content back in thinking mode to avoid HTTP 400#324

Merged
ModerRAS merged 1 commit intomasterfrom
fix/reasoning-content-thinking-mode
Apr 25, 2026
Merged

fix: pass reasoning_content back in thinking mode to avoid HTTP 400#324
ModerRAS merged 1 commit intomasterfrom
fix/reasoning-content-thinking-mode

Conversation

@ModerRAS
Copy link
Copy Markdown
Owner

@ModerRAS ModerRAS commented Apr 25, 2026

Summary

  • Fix HTTP 400 error (invalid_request_error) when using thinking mode models (e.g., Kimi-thinking-preview, QwQ)
  • The API requires reasoning_content to be passed back in subsequent requests; without it, subsequent calls fail

Changes

  • LlmContinuationSnapshot.cs: Added ReasoningContent field to SerializedChatMessage for snapshot persistence
  • OpenAIService.cs: Capture and restore reasoning_content for multi-turn conversations

Testing

  • Build passed

Summary by CodeRabbit

  • New Features
    • Added support for AI reasoning/thinking mode, enabling the system to capture and preserve advanced reasoning text from AI responses.
    • Reasoning data now persists across chat continuations, improving consistency when resuming conversations with specialized LLM modes.

For thinking mode models (e.g., Kimi-thinking-preview, QwQ), the API requires
the reasoning_content to be passed back in subsequent requests. Without this,
the API returns 'invalid_request_error: invalid_request_error'.

Changes:
- Add ReasoningContent field to SerializedChatMessage for snapshot persistence
- Capture reasoning_content during streaming updates
- Restore reasoning_content when deserializing history for API calls
- Use reflection to access OpenAI SDK internal properties

This ensures multi-turn conversations with thinking mode models work correctly.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

📝 Walkthrough

Walkthrough

The changes extend the serialized chat message model with a new optional field to store LLM thinking-mode reasoning content. The OpenAI service is updated to capture reasoning from streaming responses and propagate it through serialization and deserialization layers using reflection helpers.

Changes

Cohort / File(s) Summary
Message Model Extension
TelegramSearchBot.Common/Model/AI/LlmContinuationSnapshot.cs
Added optional ReasoningContent property to SerializedChatMessage to persist thinking-mode reasoning text.
OpenAI Service Streaming & Serialization
TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs
Extended native tool-calling streaming to capture reasoning updates from StreamingChatCompletionUpdate. Updated SerializeProviderHistory and DeserializeProviderHistory methods to handle reasoning data. Added private reflection-based helpers for extracting and restoring reasoning on assistant messages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A curious thought, now captured whole,
The AI's whispers, reasoning's soul,
Through streams and serializations they flow,
Where thinking modes shine bright and glow!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically addresses the main fix: passing reasoning_content back in thinking mode to prevent HTTP 400 errors. It matches the core objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/reasoning-content-thinking-mode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs (2)

1371-1463: ⚠️ Potential issue | 🔴 Critical

Reflection-based round-trip won't fix the HTTP 400 — OpenAI SDK 2.10.0 does not expose these properties.

The official OpenAI .NET SDK (version 2.10.0, as pinned in this repo) does not define public Reasoning or ReasoningContentUpdate properties on AssistantChatMessage or StreamingChatCompletionUpdate. As a result, this entire reflection-based approach is non-functional:

  1. GetStreamingReasoningContent will always return null since the properties don't exist — reasoningContentBuilder never accumulates anything.
  2. SetAssistantReasoningContent silently fails when the property check returns false, even if a property of that name were found, the SDK's request serializer won't emit reasoning_content in outgoing JSON (it's not part of the Chat Completions API request schema).

The HTTP 400 in thinking-mode multi-turn calls remains unfixed. For providers requiring reasoning_content to be echoed back, use one of:

  • A custom PipelinePolicy on OpenAIClientOptions.Transport to mutate outgoing request JSON and inject reasoning_content onto assistant messages before send.
  • Call the endpoint directly with HttpClient for these models instead of using ChatClient.

Verify end-to-end (not just "build passes") on a thinking-mode model that:

  • Streaming reasoning is captured (reasoningContent non-empty at Line 1016).
  • The next request includes reasoning_content in the assistant message JSON.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1371 -
1463, The reflection-based getters/setters (GetAssistantReasoningContent,
GetStreamingReasoningContent, SetAssistantReasoningContent and the
DeserializeProviderHistory usage) won’t produce or serialize reasoning_content
with OpenAI .NET SDK v2.10.0; replace this approach by mutating outgoing request
JSON (injecting reasoning_content into assistant messages) via a custom
PipelinePolicy on OpenAIClientOptions.Transport or by sending requests directly
with HttpClient to the thinking-mode endpoint, and remove/stop relying on the
reflection helpers — then verify end-to-end that streaming reasoning is captured
and the subsequent request JSON includes reasoning_content in assistant
messages.

1244-1244: ⚠️ Potential issue | 🟠 Major

ResumeFromSnapshotAsync doesn't restore tool-call assistant messages from snapshot.

DeserializeProviderHistory only reconstructs plain text assistant messages using new AssistantChatMessage(msg.Content ?? ""). Snapshots taken mid tool-cycle contain AssistantChatMessage(chatToolCalls) entries whose tool-call structure is lost on resume. Tool result messages (ToolChatMessage with "tool" role) are also not serialized or restored. This re-introduces HTTP 400 errors once a snapshot is resumed against a thinking-mode provider that expects the full tool-calling context.

ReasoningContent is properly handled via SetAssistantReasoningContent(), so it survives snapshot restore.

If snapshot resume should support thinking-mode + tool-calling flows, SerializedChatMessage needs fields for tool-call structure (tool IDs, names, arguments) and support for "tool" role messages, not just Role/Content/ReasoningContent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` at line 1244,
ResumeFromSnapshotAsync currently calls DeserializeProviderHistory which
recreates only plain AssistantChatMessage instances and drops tool-call
structure and ToolChatMessage ("tool" role) entries from snapshots; update
SerializedChatMessage to include tool-call fields (tool id/name/args and any
serialized tool response) and extend DeserializeProviderHistory to reconstruct
AssistantChatMessage instances with tool-call metadata and ToolChatMessage
entries so thinking-mode providers get full context; ensure
ResumeFromSnapshotAsync uses the new deserialization and preserve
ReasoningContent via SetAssistantReasoningContent as before.
🧹 Nitpick comments (2)
TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs (2)

1375-1419: Cache PropertyInfo lookups; called per streaming chunk.

GetStreamingReasoningContent is invoked for every StreamingChatCompletionUpdate (potentially hundreds of times per response) and each call does two Type.GetProperty lookups. Cache the resolved PropertyInfo (or a typed Func<,>) in static readonly fields keyed by Type to avoid the repeated reflection cost on the hot streaming path. The same applies to the get/set helpers on AssistantChatMessage.

Also applies to: 1454-1463

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1375 -
1419, GetStreamingReasoningContent and GetAssistantReasoningContent are doing
Type.GetProperty on every call (hot path) — cache the resolved PropertyInfo or
compiled accessors to avoid repeated reflection. Add static readonly
ConcurrentDictionary<Type, PropertyInfo?> (or ConcurrentDictionary<Type,
Func<object, string?>> for typed getters) for the ReasoningContentUpdate and
Reasoning properties and use those caches in GetStreamingReasoningContent and
GetAssistantReasoningContent (look up by update.GetType() or
assistantMsg.GetType(), retrieve cached PropertyInfo/Func, and invoke it if
present). Apply the same caching pattern to the corresponding helper code
referenced around the other block (the get/set helpers at the 1454–1463 region)
so reflection is resolved once per Type instead of per streaming chunk.

1342-1369: Tool-call assistant messages serialize as empty Content.

For an AssistantChatMessage constructed from chatToolCalls (Line 1032), assistantMsg.Content is typically empty, so this serializer writes Content = "" and drops the ChatToolCall list entirely. Combined with the resume-path issue above, snapshots of mid tool-cycle states are lossy: on restore the model sees a no-op assistant turn followed by orphan tool messages — which most providers reject with 400 (tool message must follow assistant message containing tool_calls).

This isn't introduced by this PR, but the new ReasoningContent plumbing makes the gap more visible (reasoning is preserved, tool calls aren't). Worth scoping a follow-up to extend SerializedChatMessage with ToolCalls/ToolCallId so resume actually round-trips.

Want me to draft the extended SerializedChatMessage schema plus matching serialize/deserialize logic and open a follow-up issue?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1342 -
1369, SerializeProviderHistory currently builds SerializedChatMessage.Content
from AssistantChatMessage.Content, but AssistantChatMessage instances created
from chatToolCalls have empty Content and instead carry tool call data, so tool
calls are lost on serialization; update SerializedChatMessage to include tool
call metadata (e.g., ToolCalls and/or ToolCallId) and modify
SerializeProviderHistory to extract and populate those fields from
AssistantChatMessage (and preserve existing ReasoningContent via
GetAssistantReasoningContent) so mid-tool-cycle assistant turns round-trip
correctly; ensure the new fields are set when msg is AssistantChatMessage (and
include any ChatToolCall list from assistantMsg or related properties) and
adjust deserialization to restore assistant messages plus their tool_calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs`:
- Around line 1387-1390: The three static reflection helpers
(SetAssistantReasoningContent, GetAssistantReasoningContent,
GetStreamingReasoningContent) currently swallow all exceptions; change each
empty catch to catch Exception ex and emit a Debug/Trace-level log containing
the exception message/stack (e.g., Debug.WriteLine or
Trace.TraceInformation/TraceEvent) and make the logging occur only once by
guarding with a static bool flag per method so you don’t flood logs; keep the
methods static (no DI) and ensure the original behavior still returns null/false
after logging.
- Around line 1015-1016: The code trims reasoningContent which breaks provider
compatibility; change the assignment that uses
reasoningContentBuilder.ToString().Trim() so it uses the raw string
(reasoningContentBuilder.ToString()) without calling Trim(), leaving
responseText (responseText = contentBuilder.ToString().Trim()) unchanged; ensure
the variable reasoningContent (and any downstream use in OpenAIService.cs /
methods that send chat.completions) preserves leading/trailing whitespace
exactly as produced by reasoningContentBuilder.

---

Outside diff comments:
In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs`:
- Around line 1371-1463: The reflection-based getters/setters
(GetAssistantReasoningContent, GetStreamingReasoningContent,
SetAssistantReasoningContent and the DeserializeProviderHistory usage) won’t
produce or serialize reasoning_content with OpenAI .NET SDK v2.10.0; replace
this approach by mutating outgoing request JSON (injecting reasoning_content
into assistant messages) via a custom PipelinePolicy on
OpenAIClientOptions.Transport or by sending requests directly with HttpClient to
the thinking-mode endpoint, and remove/stop relying on the reflection helpers —
then verify end-to-end that streaming reasoning is captured and the subsequent
request JSON includes reasoning_content in assistant messages.
- Line 1244: ResumeFromSnapshotAsync currently calls DeserializeProviderHistory
which recreates only plain AssistantChatMessage instances and drops tool-call
structure and ToolChatMessage ("tool" role) entries from snapshots; update
SerializedChatMessage to include tool-call fields (tool id/name/args and any
serialized tool response) and extend DeserializeProviderHistory to reconstruct
AssistantChatMessage instances with tool-call metadata and ToolChatMessage
entries so thinking-mode providers get full context; ensure
ResumeFromSnapshotAsync uses the new deserialization and preserve
ReasoningContent via SetAssistantReasoningContent as before.

---

Nitpick comments:
In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs`:
- Around line 1375-1419: GetStreamingReasoningContent and
GetAssistantReasoningContent are doing Type.GetProperty on every call (hot path)
— cache the resolved PropertyInfo or compiled accessors to avoid repeated
reflection. Add static readonly ConcurrentDictionary<Type, PropertyInfo?> (or
ConcurrentDictionary<Type, Func<object, string?>> for typed getters) for the
ReasoningContentUpdate and Reasoning properties and use those caches in
GetStreamingReasoningContent and GetAssistantReasoningContent (look up by
update.GetType() or assistantMsg.GetType(), retrieve cached PropertyInfo/Func,
and invoke it if present). Apply the same caching pattern to the corresponding
helper code referenced around the other block (the get/set helpers at the
1454–1463 region) so reflection is resolved once per Type instead of per
streaming chunk.
- Around line 1342-1369: SerializeProviderHistory currently builds
SerializedChatMessage.Content from AssistantChatMessage.Content, but
AssistantChatMessage instances created from chatToolCalls have empty Content and
instead carry tool call data, so tool calls are lost on serialization; update
SerializedChatMessage to include tool call metadata (e.g., ToolCalls and/or
ToolCallId) and modify SerializeProviderHistory to extract and populate those
fields from AssistantChatMessage (and preserve existing ReasoningContent via
GetAssistantReasoningContent) so mid-tool-cycle assistant turns round-trip
correctly; ensure the new fields are set when msg is AssistantChatMessage (and
include any ChatToolCall list from assistantMsg or related properties) and
adjust deserialization to restore assistant messages plus their tool_calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: faa699ed-b425-41fa-8700-a054d6e82283

📥 Commits

Reviewing files that changed from the base of the PR and between 20acf24 and 4feb3c8.

📒 Files selected for processing (2)
  • TelegramSearchBot.Common/Model/AI/LlmContinuationSnapshot.cs
  • TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs

Comment on lines 1015 to +1016
string responseText = contentBuilder.ToString().Trim();
string reasoningContent = reasoningContentBuilder.ToString().Trim();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does Kimi-thinking-preview or QwQ require the reasoning_content echoed back in subsequent chat.completions requests to match exactly (including leading/trailing whitespace) the streamed value, or is whitespace-trimmed content accepted?

💡 Result:

Kimi-thinking-preview (likely referring to Moonshot AI's Kimi K2 Thinking models like kimi-k2-thinking or kimi-k2.6 with thinking enabled) and QwQ (likely QwQ-32B-Preview, a reasoning model comparable to Kimi K2 Thinking) require the reasoning_content echoed back in subsequent chat.completions requests to match exactly the streamed value, including leading/trailing whitespace. Official Moonshot AI/Kimi docs explicitly state to "keep the reasoning_content from every historical assistant message in messages as-is. The simplest way is to append the assistant message returned from the previous API call directly back into messages." No sources indicate whitespace trimming is accepted; "as-is" implies exact preservation is necessary for proper preserved thinking and chain-of-thought continuity in multi-turn conversations. This applies to both official API and compatible implementations like vLLM, where field handling bugs highlight the need for precise input matching.

Citations:


Remove Trim() on reasoningContent to preserve provider compatibility.

Kimi-thinking-preview and QwQ require the reasoning_content field to be echoed back in subsequent chat.completions requests with exact preservation, including leading/trailing whitespace. Official Moonshot AI documentation explicitly states to "keep the reasoning_content from every historical assistant message in messages as-is." The current Trim() operation violates this requirement and may cause validation or continuity issues in multi-turn conversations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1015 -
1016, The code trims reasoningContent which breaks provider compatibility;
change the assignment that uses reasoningContentBuilder.ToString().Trim() so it
uses the raw string (reasoningContentBuilder.ToString()) without calling Trim(),
leaving responseText (responseText = contentBuilder.ToString().Trim())
unchanged; ensure the variable reasoningContent (and any downstream use in
OpenAIService.cs / methods that send chat.completions) preserves
leading/trailing whitespace exactly as produced by reasoningContentBuilder.

Comment on lines +1387 to +1390
} catch {
// Reflection failed, return null
}
return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently swallow reflection failures — at least log at Debug.

All three reflection helpers swallow every exception with empty catch { }. Combined with the concern above, that means if the SDK changes property names, or the property exists but isn't writable, or the cast fails, you'll get zero diagnostics — the feature will simply appear not to work in production. Capture the exception and log it at Debug/Trace level (and ideally only once via a static flag) so this is actually observable.

🛡️ Suggested change (illustrative, applied to SetAssistantReasoningContent)
-        private static void SetAssistantReasoningContent(AssistantChatMessage msg, string reasoningContent) {
-            try {
-                var prop = msg.GetType().GetProperty("Reasoning");
-                if (prop != null && prop.CanWrite) {
-                    prop.SetValue(msg, reasoningContent);
-                }
-            } catch {
-                // Reflection failed, ignore
-            }
-        }
+        private static int _reasoningReflectionWarned;
+        private static void SetAssistantReasoningContent(AssistantChatMessage msg, string reasoningContent) {
+            try {
+                var prop = msg.GetType().GetProperty("Reasoning");
+                if (prop != null && prop.CanWrite) {
+                    prop.SetValue(msg, reasoningContent);
+                } else if (System.Threading.Interlocked.Exchange(ref _reasoningReflectionWarned, 1) == 0) {
+                    System.Diagnostics.Debug.WriteLine(
+                        "AssistantChatMessage has no writable 'Reasoning' property; reasoning_content round-trip is a no-op.");
+                }
+            } catch (Exception ex) {
+                System.Diagnostics.Debug.WriteLine($"SetAssistantReasoningContent failed: {ex}");
+            }
+        }

(Same pattern applies to GetAssistantReasoningContent and GetStreamingReasoningContent. An injected ILogger would be even better, but these methods are static.)

Also applies to: 1415-1418, 1460-1462

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1387 -
1390, The three static reflection helpers (SetAssistantReasoningContent,
GetAssistantReasoningContent, GetStreamingReasoningContent) currently swallow
all exceptions; change each empty catch to catch Exception ex and emit a
Debug/Trace-level log containing the exception message/stack (e.g.,
Debug.WriteLine or Trace.TraceInformation/TraceEvent) and make the logging occur
only once by guarding with a static bool flag per method so you don’t flood
logs; keep the methods static (no DI) and ensure the original behavior still
returns null/false after logging.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 PR检查报告

📋 检查概览

🧪 测试结果

平台 状态 详情
Ubuntu 🟢 成功 测试通过,产物已上传
Windows 🟢 成功 测试通过,产物已上传

📊 代码质量

  • ✅ 代码格式化检查
  • ✅ 安全漏洞扫描
  • ✅ 依赖包分析
  • ✅ 代码覆盖率收集

📁 测试产物

  • 测试结果 artifacts 数量: 2
  • 代码覆盖率已上传到Codecov

🔗 相关链接


此报告由GitHub Actions自动生成

@ModerRAS ModerRAS merged commit ee3d454 into master Apr 25, 2026
10 checks passed
@ModerRAS ModerRAS deleted the fix/reasoning-content-thinking-mode branch April 25, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant