Sjshi763/issue4363 fix:当 LLM 的回复本身包含类似 JSON 的格式的时候消息的 content 字段可能被错误地多次序列化 #4546

Sjshi763 · 2026-01-18T07:25:55Z

Modifications / 改动点

components.py 新增消息组件类型处理逻辑

openai_source.py 增强OpenAI API错误处理机制

This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

😊 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
👀 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
🤓 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到了 requirements.txt 和 pyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Summary by Sourcery

处理多模态的 OpenAI 聊天补全内容，并避免在消息组件中出现不正确的 JSON 重新序列化问题。

新功能：

支持解析以多模态列表形式提供的 OpenAI 聊天补全内容，包括文本、图像和推理片段。
在将消息链转换为字典时，为回复消息组件添加序列化支持。

错误修复：

通过在解码失败时保留原始文本，防止 JSON 类型的消息组件对非 JSON 字符串内容进行错误的重新序列化或解析失败。

改进：

改进从结构化补全部分和 OpenAI 响应中的 <think> 标签提取和存储推理内容的方式。

Original summary in English

Summary by Sourcery

Handle multimodal OpenAI chat completion content and avoid incorrect JSON re-serialization in message components.

New Features:

Support parsing OpenAI chat completion content provided as multimodal lists, including text, images, and reasoning segments.
Add serialization support for reply message components when converting message chains to dicts.

Bug Fixes:

Prevent JSON-type message components from incorrectly re-serializing or failing on non-JSON string content by preserving raw text when decoding fails.

Enhancements:

Improve extraction and storage of reasoning content from structured completion parts and tags in OpenAI responses.

Fixes AstrBotDevs#4363

sourcery-ai

Hey——我这里发现了两个问题，并且给了一些整体性的反馈：

在 _parse_openai_completion 的新多模态处理逻辑中，如果代码库中已有处理图片/base64 或构建 MessageChain 的通用工具函数，建议优先复用这些已有工具，避免重复实现逻辑，同时保持不同 provider 之间的行为一致。
新的 Json.to_dict 实现会始终把内容包装为 { "content": ... }，这会改变原先对字典输入时序列化 payload 的结构。如果有下游调用方依赖之前 data 的结构，建议当 data 本身就是 dict 时保留原始字典结构，仅在原始数据是字符串时再使用 content 包装形式。
在 Reply.to_dict 中，你把类型字符串写死成了 "reply"；为了保持一致性并方便后续重构，建议像其他组件一样使用 self.type，这样当 ComponentType.Reply 发生变更时就能自动传播到这里。

面向 AI Agent 的提示词

请根据下面的代码评审意见进行修改：

## 全局评论
- 在 `_parse_openai_completion` 的新多模态处理逻辑中，如果代码库中已有处理图片/base64 或构建 MessageChain 的通用工具函数，建议优先复用这些已有工具，避免重复实现逻辑，同时保持不同 provider 之间的行为一致。
- 新的 `Json.to_dict` 实现会始终把内容包装为 `{ "content": ... }`，这会改变原先对字典输入时序列化 payload 的结构。如果有下游调用方依赖之前 `data` 的结构，建议当 `data` 本身就是 dict 时保留原始字典结构，仅在原始数据是字符串时再使用 `content` 包装形式。
- 在 `Reply.to_dict` 中，你把类型字符串写死成了 `"reply"`；为了保持一致性并方便后续重构，建议像其他组件一样使用 `self.type`，这样当 `ComponentType.Reply` 发生变更时就能自动传播到这里。

## 单条评论

### 评论 1
<location> `astrbot/core/provider/sources/openai_source.py:253-262` </location>
<code_context>
+            # content can be either a plain string or a multimodal list
+            content = choice.message.content
+            # handle multimodal content returned as a list of parts
+            if isinstance(content, list):
+                reasoning_parts = []
+                mc = MessageChain()
+                for part in content:
+                    if not isinstance(part, dict):
+                        # fallback: append as plain text
+                        mc.message(str(part))
+                        continue
+                    ptype = part.get("type")
+                    if ptype == "text":
+                        mc.message(part.get("text", ""))
+                    elif ptype == "image_url":
+                        image_field = part.get("image_url")
</code_context>

<issue_to_address>
**suggestion (bug_risk):** 多模态响应中的文本部分现在不会再去除 <think> 标签，也不会填充 reasoning_content，这与纯字符串路径的行为不一致。

在多模态的 `list` 分支中，`ptype == "text"` 时只是直接追加 `part["text"]`，而字符串分支仍然会把 `<think>...</think>` 提取到 `reasoning_content` 中，并从可见输出中移除。这意味着多模态文本中的任何 `<think>` 内容现在都会直接展示给用户，而且两种路径的处理方式不一致。你很可能需要在这里对文本部分也执行相同的基于正则的提取逻辑（并把匹配结果追加到 `reasoning_parts`），从而让多模态和纯文本响应的处理保持一致。

建议实现如下：

```python
            # content can be either a plain string or a multimodal list
            content = choice.message.content
            # handle multimodal content returned as a list of parts
            if isinstance(content, list):
                reasoning_parts = []
                mc = MessageChain()
                for part in content:
                    if not isinstance(part, dict):
                        # fallback: append as plain text
                        mc.message(str(part))
                        continue
                    ptype = part.get("type")
                    if ptype == "text":
                        text = part.get("text", "") or ""
                        if text:
                            # extract <think>...</think> segments into reasoning_parts
                            try:
                                import re  # local import in case module-level import is not present
                                think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL)
                                if think_matches:
                                    reasoning_parts.extend(think_matches)
                                    # strip all <think> blocks from the visible completion text
                                    text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
                            except Exception:
                                # on any failure, fall back to using the raw text
                                pass
                        mc.message(text)
                    elif ptype == "image_url":
                        image_field = part.get("image_url")
                        url = None
                        if isinstance(image_field, dict):
                            url = image_field.get("url")
                        else:
                            url = image_field
                        if url:
                            # data:image/...;base64,xxx
                            if isinstance(url, str) and "base64," in url:
                                base64_data = url.split("base64,", 1)[1]
                                mc.base64_image(base64_data)

```

1. 如果字符串分支中已经有用于提取 `<think>` 标签的共享正则或辅助函数（例如预编译的 pattern 或工具方法），请在这里复用，而不是在本地使用新的 `re.findall` / `re.sub`，以保持一致性。
2. 如果模块顶层已经导入了 `re`，就可以去掉 `try` 块里的 `import re`，直接依赖模块级导入，避免重复导入。
3. 请确保在该函数的其他位置，会把 `reasoning_parts` 合并并赋值给与字符串路径中相同的 `reasoning_content`（或等价字段），从而让调用方获得统一的推理内容。
</issue_to_address>

### 评论 2
<location> `astrbot/core/provider/sources/openai_source.py:250` </location>
<code_context>
-                )
-                completion_text = reasoning_pattern.sub("", completion_text).strip()
-            llm_response.result_chain = MessageChain().message(completion_text)
+            # content can be either a plain string or a multimodal list
+            content = choice.message.content
+            # handle multimodal content returned as a list of parts
</code_context>

<issue_to_address>
**issue (complexity):** 建议把多模态内容处理、图片处理以及 `<think>` 解析拆分到独立的辅助函数中，让 `_parse_openai_completion` 主要负责协调调用。

你可以保持当前的新功能，但通过将多模态解析以及图片/推理处理抽取为几个小的 helper 函数来降低复杂度。这样 `_parse_openai_completion` 就主要做流程编排，分支嵌套会明显减少。

例如，可以把多模态处理抽取出来：

```python
def _build_message_chain_from_content(self, content) -> tuple[MessageChain, str | None]:
    mc = MessageChain()
    reasoning_parts: list[str] = []

    if isinstance(content, list):
        for part in content:
            if not isinstance(part, dict):
                mc.message(str(part))
                continue

            ptype = part.get("type")
            if ptype == "text":
                mc.message(part.get("text", ""))
            elif ptype == "image_url":
                self._append_image_part(mc, part.get("image_url"))
            elif ptype == "think":
                think_val = part.get("think")
                if think_val:
                    reasoning_parts.append(str(think_val))
            else:
                mc.message(json.dumps(part, ensure_ascii=False))
    else:
        # plain string, including <think> tag handling
        completion_text, reasoning = self._extract_reasoning_from_text(str(content))
        mc.message(completion_text)
        if reasoning:
            reasoning_parts.append(reasoning)

    reasoning_content = None
    if reasoning_parts:
        reasoning_content = "\n".join(rp.strip() for rp in reasoning_parts)

    return mc, reasoning_content
```

将图片 URL 处理再单独封装，以避免主流程中的嵌套条件判断：

```python
def _append_image_part(self, mc: MessageChain, image_field: Any) -> None:
    url = image_field.get("url") if isinstance(image_field, dict) else image_field
    if not url or not isinstance(url, str):
        return

    if "base64," in url:
        mc.base64_image(url.split("base64,", 1)[1])
    elif url.startswith("base64://"):
        mc.base64_image(url.replace("base64://", ""))
    else:
        mc.url_image(url)
```

再为 `<think>` 标签处理写一个小 helper：

```python
def _extract_reasoning_from_text(self, text: str) -> tuple[str, str | None]:
    reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)
    matches = reasoning_pattern.findall(text)
    reasoning = None
    if matches:
        reasoning = "\n".join(match.strip() for match in matches)
        text = reasoning_pattern.sub("", text).strip()
    return text.strip(), reasoning
```

这样 `_parse_openai_completion` 就可以简化为：

```python
if choice.message.content is not None:
    mc, reasoning_from_content = self._build_message_chain_from_content(
        choice.message.content
    )
    if reasoning_from_content:
        llm_response.reasoning_content = reasoning_from_content
    llm_response.result_chain = mc

# still let _extract_reasoning_content override if present
llm_response.reasoning_content = self._extract_reasoning_content(completion)
```

这样既保留了所有行为（多模态、文本、`<think>` 标签、不同图片格式、未知类型分支），又把复杂的分支逻辑移出了核心解析函数。
</issue_to_address>

Sourcery 对开源项目免费使用——如果你觉得这些评审有帮助，欢迎分享 ✨

_{帮我变得更有用！请对每条评论点 👍 或 👎，我会根据你的反馈改进后续评审。}

Original comment in English

Hey - I've found 2 issues, and left some high level feedback:

In the new multimodal handling in _parse_openai_completion, consider reusing any existing helpers for image/base64 parsing or MessageChain construction (if present elsewhere in the codebase) to avoid duplicating logic and to keep provider behavior consistent.
The new Json.to_dict implementation always wraps the content as { "content": ... }, which changes the shape of the serialized payload for dict inputs; if any consumers rely on the previous structure of data, you may want to preserve the original dict form when data was a dict, and only use the content wrapper for raw string cases.
In Reply.to_dict, you hardcode the type string as "reply"; for consistency and easier refactoring later, consider using self.type (like other components) so changes to ComponentType.Reply propagate automatically.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In the new multimodal handling in `_parse_openai_completion`, consider reusing any existing helpers for image/base64 parsing or MessageChain construction (if present elsewhere in the codebase) to avoid duplicating logic and to keep provider behavior consistent.
- The new `Json.to_dict` implementation always wraps the content as `{ "content": ... }`, which changes the shape of the serialized payload for dict inputs; if any consumers rely on the previous structure of `data`, you may want to preserve the original dict form when `data` was a dict, and only use the `content` wrapper for raw string cases.
- In `Reply.to_dict`, you hardcode the type string as `"reply"`; for consistency and easier refactoring later, consider using `self.type` (like other components) so changes to `ComponentType.Reply` propagate automatically.

## Individual Comments

### Comment 1
<location> `astrbot/core/provider/sources/openai_source.py:253-262` </location>
<code_context>
+            # content can be either a plain string or a multimodal list
+            content = choice.message.content
+            # handle multimodal content returned as a list of parts
+            if isinstance(content, list):
+                reasoning_parts = []
+                mc = MessageChain()
+                for part in content:
+                    if not isinstance(part, dict):
+                        # fallback: append as plain text
+                        mc.message(str(part))
+                        continue
+                    ptype = part.get("type")
+                    if ptype == "text":
+                        mc.message(part.get("text", ""))
+                    elif ptype == "image_url":
+                        image_field = part.get("image_url")
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Text parts in multimodal responses no longer strip <think> tags or populate reasoning_content, unlike the string path.

In the multimodal `list` branch, `ptype == "text"` just appends `part["text"]` as-is, while the string-only branch still extracts `<think>...</think>` into `reasoning_content` and strips it from the visible completion. This means any `<think>` in multimodal text parts will now be shown to users and handled inconsistently between the two paths. You likely want to run the same regex-based extraction on text parts here (and append matches to `reasoning_parts`) so multimodal and string responses are treated consistently.

Suggested implementation:

```python
            # content can be either a plain string or a multimodal list
            content = choice.message.content
            # handle multimodal content returned as a list of parts
            if isinstance(content, list):
                reasoning_parts = []
                mc = MessageChain()
                for part in content:
                    if not isinstance(part, dict):
                        # fallback: append as plain text
                        mc.message(str(part))
                        continue
                    ptype = part.get("type")
                    if ptype == "text":
                        text = part.get("text", "") or ""
                        if text:
                            # extract <think>...</think> segments into reasoning_parts
                            try:
                                import re  # local import in case module-level import is not present
                                think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL)
                                if think_matches:
                                    reasoning_parts.extend(think_matches)
                                    # strip all <think> blocks from the visible completion text
                                    text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
                            except Exception:
                                # on any failure, fall back to using the raw text
                                pass
                        mc.message(text)
                    elif ptype == "image_url":
                        image_field = part.get("image_url")
                        url = None
                        if isinstance(image_field, dict):
                            url = image_field.get("url")
                        else:
                            url = image_field
                        if url:
                            # data:image/...;base64,xxx
                            if isinstance(url, str) and "base64," in url:
                                base64_data = url.split("base64,", 1)[1]
                                mc.base64_image(base64_data)

```

1. If there is already a shared regex or helper used in the string-only path for extracting `<think>` tags (e.g. a compiled pattern or utility function), replace the inline `re.findall`/`re.sub` with that shared mechanism for consistency.
2. If `re` is already imported at the module level, you can safely remove the `import re` inside the `try` block to avoid duplicate imports and rely on the module-level import instead.
3. Ensure that elsewhere in this function, `reasoning_parts` is joined and assigned to the same `reasoning_content` (or equivalent) field that is used in the string-only path so that callers receive a unified reasoning payload.
</issue_to_address>

### Comment 2
<location> `astrbot/core/provider/sources/openai_source.py:250` </location>
<code_context>
-                )
-                completion_text = reasoning_pattern.sub("", completion_text).strip()
-            llm_response.result_chain = MessageChain().message(completion_text)
+            # content can be either a plain string or a multimodal list
+            content = choice.message.content
+            # handle multimodal content returned as a list of parts
</code_context>

<issue_to_address>
**issue (complexity):** Consider extracting the multimodal content, image handling, and `<think>` parsing into dedicated helper functions so `_parse_openai_completion` mainly orchestrates them.

You can keep the new functionality but reduce complexity by pulling the multimodal parsing and image/reasoning handling into small helpers. That makes `_parse_openai_completion` mostly orchestration and flattens the nesting.

For example, extract the multimodal handling:

```python
def _build_message_chain_from_content(self, content) -> tuple[MessageChain, str | None]:
    mc = MessageChain()
    reasoning_parts: list[str] = []

    if isinstance(content, list):
        for part in content:
            if not isinstance(part, dict):
                mc.message(str(part))
                continue

            ptype = part.get("type")
            if ptype == "text":
                mc.message(part.get("text", ""))
            elif ptype == "image_url":
                self._append_image_part(mc, part.get("image_url"))
            elif ptype == "think":
                think_val = part.get("think")
                if think_val:
                    reasoning_parts.append(str(think_val))
            else:
                mc.message(json.dumps(part, ensure_ascii=False))
    else:
        # plain string, including <think> tag handling
        completion_text, reasoning = self._extract_reasoning_from_text(str(content))
        mc.message(completion_text)
        if reasoning:
            reasoning_parts.append(reasoning)

    reasoning_content = None
    if reasoning_parts:
        reasoning_content = "\n".join(rp.strip() for rp in reasoning_parts)

    return mc, reasoning_content
```

Factor image URL handling separately to avoid nested conditionals in the main logic:

```python
def _append_image_part(self, mc: MessageChain, image_field: Any) -> None:
    url = image_field.get("url") if isinstance(image_field, dict) else image_field
    if not url or not isinstance(url, str):
        return

    if "base64," in url:
        mc.base64_image(url.split("base64,", 1)[1])
    elif url.startswith("base64://"):
        mc.base64_image(url.replace("base64://", ""))
    else:
        mc.url_image(url)
```

And a small helper for the `<think>`-tag case:

```python
def _extract_reasoning_from_text(self, text: str) -> tuple[str, str | None]:
    reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)
    matches = reasoning_pattern.findall(text)
    reasoning = None
    if matches:
        reasoning = "\n".join(match.strip() for match in matches)
        text = reasoning_pattern.sub("", text).strip()
    return text.strip(), reasoning
```

Then `_parse_openai_completion` becomes less branched:

```python
if choice.message.content is not None:
    mc, reasoning_from_content = self._build_message_chain_from_content(
        choice.message.content
    )
    if reasoning_from_content:
        llm_response.reasoning_content = reasoning_from_content
    llm_response.result_chain = mc

# still let _extract_reasoning_content override if present
llm_response.reasoning_content = self._extract_reasoning_content(completion)
```

This keeps all behaviors (multimodal, text, `<think>` tags, image formats, unknown parts) but moves the detailed branching out of the central parsing function.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-01-18T07:27:39Z

astrbot/core/provider/sources/openai_source.py

+            if isinstance(content, list):
+                reasoning_parts = []
+                mc = MessageChain()
+                for part in content:
+                    if not isinstance(part, dict):
+                        # fallback: append as plain text
+                        mc.message(str(part))
+                        continue
+                    ptype = part.get("type")
+                    if ptype == "text":


suggestion (bug_risk): 多模态响应中的文本部分现在不会再去除标签，也不会填充 reasoning_content，这与纯字符串路径的行为不一致。

在多模态的 list 分支中，ptype == "text" 时只是直接追加 part["text"]，而字符串分支仍然会把 <think>...</think> 提取到 reasoning_content 中，并从可见输出中移除。这意味着多模态文本中的任何 <think> 内容现在都会直接展示给用户，而且两种路径的处理方式不一致。你很可能需要在这里对文本部分也执行相同的基于正则的提取逻辑（并把匹配结果追加到 reasoning_parts），从而让多模态和纯文本响应的处理保持一致。

建议实现如下：

# content can be either a plain string or a multimodal list content = choice.message.content # handle multimodal content returned as a list of parts if isinstance(content, list): reasoning_parts = [] mc = MessageChain() for part in content: if not isinstance(part, dict): # fallback: append as plain text mc.message(str(part)) continue ptype = part.get("type") if ptype == "text": text = part.get("text", "") or "" if text: # extract <think>...</think> segments into reasoning_parts try: import re # local import in case module-level import is not present think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL) if think_matches: reasoning_parts.extend(think_matches) # strip all <think> blocks from the visible completion text text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL) except Exception: # on any failure, fall back to using the raw text pass mc.message(text) elif ptype == "image_url": image_field = part.get("image_url") url = None if isinstance(image_field, dict): url = image_field.get("url") else: url = image_field if url: # data:image/...;base64,xxx if isinstance(url, str) and "base64," in url: base64_data = url.split("base64,", 1)[1] mc.base64_image(base64_data)

如果字符串分支中已经有用于提取 <think> 标签的共享正则或辅助函数（例如预编译的 pattern 或工具方法），请在这里复用，而不是在本地使用新的 re.findall / re.sub，以保持一致性。

如果模块顶层已经导入了 re，就可以去掉 try 块里的 import re，直接依赖模块级导入，避免重复导入。

请确保在该函数的其他位置，会把 reasoning_parts 合并并赋值给与字符串路径中相同的 reasoning_content（或等价字段），从而让调用方获得统一的推理内容。

Original comment in English

suggestion (bug_risk): Text parts in multimodal responses no longer strip tags or populate reasoning_content, unlike the string path.

In the multimodal list branch, ptype == "text" just appends part["text"] as-is, while the string-only branch still extracts <think>...</think> into reasoning_content and strips it from the visible completion. This means any <think> in multimodal text parts will now be shown to users and handled inconsistently between the two paths. You likely want to run the same regex-based extraction on text parts here (and append matches to reasoning_parts) so multimodal and string responses are treated consistently.

Suggested implementation:

# content can be either a plain string or a multimodal list content = choice.message.content # handle multimodal content returned as a list of parts if isinstance(content, list): reasoning_parts = [] mc = MessageChain() for part in content: if not isinstance(part, dict): # fallback: append as plain text mc.message(str(part)) continue ptype = part.get("type") if ptype == "text": text = part.get("text", "") or "" if text: # extract <think>...</think> segments into reasoning_parts try: import re # local import in case module-level import is not present think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL) if think_matches: reasoning_parts.extend(think_matches) # strip all <think> blocks from the visible completion text text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL) except Exception: # on any failure, fall back to using the raw text pass mc.message(text) elif ptype == "image_url": image_field = part.get("image_url") url = None if isinstance(image_field, dict): url = image_field.get("url") else: url = image_field if url: # data:image/...;base64,xxx if isinstance(url, str) and "base64," in url: base64_data = url.split("base64,", 1)[1] mc.base64_image(base64_data)

If there is already a shared regex or helper used in the string-only path for extracting <think> tags (e.g. a compiled pattern or utility function), replace the inline re.findall/re.sub with that shared mechanism for consistency.

If re is already imported at the module level, you can safely remove the import re inside the try block to avoid duplicate imports and rely on the module-level import instead.

Ensure that elsewhere in this function, reasoning_parts is joined and assigned to the same reasoning_content (or equivalent) field that is used in the string-only path so that callers receive a unified reasoning payload.

sourcery-ai · 2026-01-18T07:27:39Z

astrbot/core/provider/sources/openai_source.py

-                )
-                completion_text = reasoning_pattern.sub("", completion_text).strip()
-            llm_response.result_chain = MessageChain().message(completion_text)
+            # content can be either a plain string or a multimodal list


issue (complexity): 建议把多模态内容处理、图片处理以及 <think> 解析拆分到独立的辅助函数中，让 _parse_openai_completion 主要负责协调调用。

你可以保持当前的新功能，但通过将多模态解析以及图片/推理处理抽取为几个小的 helper 函数来降低复杂度。这样 _parse_openai_completion 就主要做流程编排，分支嵌套会明显减少。

例如，可以把多模态处理抽取出来：

def _build_message_chain_from_content(self, content) -> tuple[MessageChain, str | None]: mc = MessageChain() reasoning_parts: list[str] = [] if isinstance(content, list): for part in content: if not isinstance(part, dict): mc.message(str(part)) continue ptype = part.get("type") if ptype == "text": mc.message(part.get("text", "")) elif ptype == "image_url": self._append_image_part(mc, part.get("image_url")) elif ptype == "think": think_val = part.get("think") if think_val: reasoning_parts.append(str(think_val)) else: mc.message(json.dumps(part, ensure_ascii=False)) else: # plain string, including <think> tag handling completion_text, reasoning = self._extract_reasoning_from_text(str(content)) mc.message(completion_text) if reasoning: reasoning_parts.append(reasoning) reasoning_content = None if reasoning_parts: reasoning_content = "\n".join(rp.strip() for rp in reasoning_parts) return mc, reasoning_content

将图片 URL 处理再单独封装，以避免主流程中的嵌套条件判断：

def _append_image_part(self, mc: MessageChain, image_field: Any) -> None: url = image_field.get("url") if isinstance(image_field, dict) else image_field if not url or not isinstance(url, str): return if "base64," in url: mc.base64_image(url.split("base64,", 1)[1]) elif url.startswith("base64://"): mc.base64_image(url.replace("base64://", "")) else: mc.url_image(url)

再为 <think> 标签处理写一个小 helper：

def _extract_reasoning_from_text(self, text: str) -> tuple[str, str | None]: reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL) matches = reasoning_pattern.findall(text) reasoning = None if matches: reasoning = "\n".join(match.strip() for match in matches) text = reasoning_pattern.sub("", text).strip() return text.strip(), reasoning

这样 _parse_openai_completion 就可以简化为：

if choice.message.content is not None: mc, reasoning_from_content = self._build_message_chain_from_content( choice.message.content ) if reasoning_from_content: llm_response.reasoning_content = reasoning_from_content llm_response.result_chain = mc # still let _extract_reasoning_content override if present llm_response.reasoning_content = self._extract_reasoning_content(completion)

这样既保留了所有行为（多模态、文本、<think> 标签、不同图片格式、未知类型分支），又把复杂的分支逻辑移出了核心解析函数。

Original comment in English

issue (complexity): Consider extracting the multimodal content, image handling, and <think> parsing into dedicated helper functions so _parse_openai_completion mainly orchestrates them.

You can keep the new functionality but reduce complexity by pulling the multimodal parsing and image/reasoning handling into small helpers. That makes _parse_openai_completion mostly orchestration and flattens the nesting.

For example, extract the multimodal handling:

def _build_message_chain_from_content(self, content) -> tuple[MessageChain, str | None]: mc = MessageChain() reasoning_parts: list[str] = [] if isinstance(content, list): for part in content: if not isinstance(part, dict): mc.message(str(part)) continue ptype = part.get("type") if ptype == "text": mc.message(part.get("text", "")) elif ptype == "image_url": self._append_image_part(mc, part.get("image_url")) elif ptype == "think": think_val = part.get("think") if think_val: reasoning_parts.append(str(think_val)) else: mc.message(json.dumps(part, ensure_ascii=False)) else: # plain string, including <think> tag handling completion_text, reasoning = self._extract_reasoning_from_text(str(content)) mc.message(completion_text) if reasoning: reasoning_parts.append(reasoning) reasoning_content = None if reasoning_parts: reasoning_content = "\n".join(rp.strip() for rp in reasoning_parts) return mc, reasoning_content

Factor image URL handling separately to avoid nested conditionals in the main logic:

def _append_image_part(self, mc: MessageChain, image_field: Any) -> None: url = image_field.get("url") if isinstance(image_field, dict) else image_field if not url or not isinstance(url, str): return if "base64," in url: mc.base64_image(url.split("base64,", 1)[1]) elif url.startswith("base64://"): mc.base64_image(url.replace("base64://", "")) else: mc.url_image(url)

And a small helper for the <think>-tag case:

def _extract_reasoning_from_text(self, text: str) -> tuple[str, str | None]: reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL) matches = reasoning_pattern.findall(text) reasoning = None if matches: reasoning = "\n".join(match.strip() for match in matches) text = reasoning_pattern.sub("", text).strip() return text.strip(), reasoning

Then _parse_openai_completion becomes less branched:

if choice.message.content is not None: mc, reasoning_from_content = self._build_message_chain_from_content( choice.message.content ) if reasoning_from_content: llm_response.reasoning_content = reasoning_from_content llm_response.result_chain = mc # still let _extract_reasoning_content override if present llm_response.reasoning_content = self._extract_reasoning_content(completion)

This keeps all behaviors (multimodal, text, <think> tags, image formats, unknown parts) but moves the detailed branching out of the central parsing function.

Sjshi763 · 2026-01-18T07:57:02Z

我去，有的地方确实要改，那我下下周或者好心人帮我吧

Sjshi763 added 4 commits January 9, 2026 19:04

[Bug]当 LLM 的回复本身包含类似 JSON 的格式的时候消息的 content 字段可能被错误地多次序列化

b0a98ad

Fixes AstrBotDevs#4363

Merge branch 'AstrBotDevs:master' into Sjshi763/issue4363

ab31cd1

[Bug]当 LLM 的回复本身包含类似 JSON 的格式的时候消息的 content 字段可能被错误地多次序列化

c7f27a4

Fixes AstrBotDevs#4363

Merge branch 'AstrBotDevs:master' into Sjshi763/issue4363

e13ca9c

auto-assign bot requested review from LIghtJUNction and anka-afk January 18, 2026 07:25

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Jan 18, 2026

sourcery-ai bot reviewed Jan 18, 2026

View reviewed changes

Sjshi763 marked this pull request as draft January 18, 2026 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Sjshi763/issue4363 fix:当 LLM 的回复本身包含类似 JSON 的格式的时候消息的 content 字段可能被错误地多次序列化 #4546

Sjshi763/issue4363 fix:当 LLM 的回复本身包含类似 JSON 的格式的时候消息的 content 字段可能被错误地多次序列化 #4546

Sjshi763 commented Jan 18, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Jan 18, 2026

Uh oh!

sourcery-ai bot Jan 18, 2026

Uh oh!

Sjshi763 commented Jan 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Sjshi763/issue4363 fix:当 LLM 的回复本身包含类似 JSON 的格式的时候消息的 content 字段可能被错误地多次序列化 #4546

Are you sure you want to change the base?

Sjshi763/issue4363 fix:当 LLM 的回复本身包含类似 JSON 的格式的时候消息的 content 字段可能被错误地多次序列化 #4546

Conversation

Sjshi763 commented Jan 18, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Modifications / 改动点

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Sjshi763 commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sjshi763 commented Jan 18, 2026 •

edited by sourcery-ai bot

Loading

Sjshi763 commented Jan 18, 2026 •

edited

Loading