-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Sjshi763/issue4363 fix:当 LLM 的回复本身包含类似 JSON 的格式的时候消息的 content 字段可能被错误地多次序列化 #4546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey——我这里发现了两个问题,并且给了一些整体性的反馈:
- 在
_parse_openai_completion的新多模态处理逻辑中,如果代码库中已有处理图片/base64 或构建 MessageChain 的通用工具函数,建议优先复用这些已有工具,避免重复实现逻辑,同时保持不同 provider 之间的行为一致。 - 新的
Json.to_dict实现会始终把内容包装为{ "content": ... },这会改变原先对字典输入时序列化 payload 的结构。如果有下游调用方依赖之前data的结构,建议当data本身就是 dict 时保留原始字典结构,仅在原始数据是字符串时再使用content包装形式。 - 在
Reply.to_dict中,你把类型字符串写死成了"reply";为了保持一致性并方便后续重构,建议像其他组件一样使用self.type,这样当ComponentType.Reply发生变更时就能自动传播到这里。
面向 AI Agent 的提示词
请根据下面的代码评审意见进行修改:
## 全局评论
- 在 `_parse_openai_completion` 的新多模态处理逻辑中,如果代码库中已有处理图片/base64 或构建 MessageChain 的通用工具函数,建议优先复用这些已有工具,避免重复实现逻辑,同时保持不同 provider 之间的行为一致。
- 新的 `Json.to_dict` 实现会始终把内容包装为 `{ "content": ... }`,这会改变原先对字典输入时序列化 payload 的结构。如果有下游调用方依赖之前 `data` 的结构,建议当 `data` 本身就是 dict 时保留原始字典结构,仅在原始数据是字符串时再使用 `content` 包装形式。
- 在 `Reply.to_dict` 中,你把类型字符串写死成了 `"reply"`;为了保持一致性并方便后续重构,建议像其他组件一样使用 `self.type`,这样当 `ComponentType.Reply` 发生变更时就能自动传播到这里。
## 单条评论
### 评论 1
<location> `astrbot/core/provider/sources/openai_source.py:253-262` </location>
<code_context>
+ # content can be either a plain string or a multimodal list
+ content = choice.message.content
+ # handle multimodal content returned as a list of parts
+ if isinstance(content, list):
+ reasoning_parts = []
+ mc = MessageChain()
+ for part in content:
+ if not isinstance(part, dict):
+ # fallback: append as plain text
+ mc.message(str(part))
+ continue
+ ptype = part.get("type")
+ if ptype == "text":
+ mc.message(part.get("text", ""))
+ elif ptype == "image_url":
+ image_field = part.get("image_url")
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 多模态响应中的文本部分现在不会再去除 <think> 标签,也不会填充 reasoning_content,这与纯字符串路径的行为不一致。
在多模态的 `list` 分支中,`ptype == "text"` 时只是直接追加 `part["text"]`,而字符串分支仍然会把 `<think>...</think>` 提取到 `reasoning_content` 中,并从可见输出中移除。这意味着多模态文本中的任何 `<think>` 内容现在都会直接展示给用户,而且两种路径的处理方式不一致。你很可能需要在这里对文本部分也执行相同的基于正则的提取逻辑(并把匹配结果追加到 `reasoning_parts`),从而让多模态和纯文本响应的处理保持一致。
建议实现如下:
```python
# content can be either a plain string or a multimodal list
content = choice.message.content
# handle multimodal content returned as a list of parts
if isinstance(content, list):
reasoning_parts = []
mc = MessageChain()
for part in content:
if not isinstance(part, dict):
# fallback: append as plain text
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
text = part.get("text", "") or ""
if text:
# extract <think>...</think> segments into reasoning_parts
try:
import re # local import in case module-level import is not present
think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL)
if think_matches:
reasoning_parts.extend(think_matches)
# strip all <think> blocks from the visible completion text
text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
except Exception:
# on any failure, fall back to using the raw text
pass
mc.message(text)
elif ptype == "image_url":
image_field = part.get("image_url")
url = None
if isinstance(image_field, dict):
url = image_field.get("url")
else:
url = image_field
if url:
# data:image/...;base64,xxx
if isinstance(url, str) and "base64," in url:
base64_data = url.split("base64,", 1)[1]
mc.base64_image(base64_data)
```
1. 如果字符串分支中已经有用于提取 `<think>` 标签的共享正则或辅助函数(例如预编译的 pattern 或工具方法),请在这里复用,而不是在本地使用新的 `re.findall` / `re.sub`,以保持一致性。
2. 如果模块顶层已经导入了 `re`,就可以去掉 `try` 块里的 `import re`,直接依赖模块级导入,避免重复导入。
3. 请确保在该函数的其他位置,会把 `reasoning_parts` 合并并赋值给与字符串路径中相同的 `reasoning_content`(或等价字段),从而让调用方获得统一的推理内容。
</issue_to_address>
### 评论 2
<location> `astrbot/core/provider/sources/openai_source.py:250` </location>
<code_context>
- )
- completion_text = reasoning_pattern.sub("", completion_text).strip()
- llm_response.result_chain = MessageChain().message(completion_text)
+ # content can be either a plain string or a multimodal list
+ content = choice.message.content
+ # handle multimodal content returned as a list of parts
</code_context>
<issue_to_address>
**issue (complexity):** 建议把多模态内容处理、图片处理以及 `<think>` 解析拆分到独立的辅助函数中,让 `_parse_openai_completion` 主要负责协调调用。
你可以保持当前的新功能,但通过将多模态解析以及图片/推理处理抽取为几个小的 helper 函数来降低复杂度。这样 `_parse_openai_completion` 就主要做流程编排,分支嵌套会明显减少。
例如,可以把多模态处理抽取出来:
```python
def _build_message_chain_from_content(self, content) -> tuple[MessageChain, str | None]:
mc = MessageChain()
reasoning_parts: list[str] = []
if isinstance(content, list):
for part in content:
if not isinstance(part, dict):
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
mc.message(part.get("text", ""))
elif ptype == "image_url":
self._append_image_part(mc, part.get("image_url"))
elif ptype == "think":
think_val = part.get("think")
if think_val:
reasoning_parts.append(str(think_val))
else:
mc.message(json.dumps(part, ensure_ascii=False))
else:
# plain string, including <think> tag handling
completion_text, reasoning = self._extract_reasoning_from_text(str(content))
mc.message(completion_text)
if reasoning:
reasoning_parts.append(reasoning)
reasoning_content = None
if reasoning_parts:
reasoning_content = "\n".join(rp.strip() for rp in reasoning_parts)
return mc, reasoning_content
```
将图片 URL 处理再单独封装,以避免主流程中的嵌套条件判断:
```python
def _append_image_part(self, mc: MessageChain, image_field: Any) -> None:
url = image_field.get("url") if isinstance(image_field, dict) else image_field
if not url or not isinstance(url, str):
return
if "base64," in url:
mc.base64_image(url.split("base64,", 1)[1])
elif url.startswith("base64://"):
mc.base64_image(url.replace("base64://", ""))
else:
mc.url_image(url)
```
再为 `<think>` 标签处理写一个小 helper:
```python
def _extract_reasoning_from_text(self, text: str) -> tuple[str, str | None]:
reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)
matches = reasoning_pattern.findall(text)
reasoning = None
if matches:
reasoning = "\n".join(match.strip() for match in matches)
text = reasoning_pattern.sub("", text).strip()
return text.strip(), reasoning
```
这样 `_parse_openai_completion` 就可以简化为:
```python
if choice.message.content is not None:
mc, reasoning_from_content = self._build_message_chain_from_content(
choice.message.content
)
if reasoning_from_content:
llm_response.reasoning_content = reasoning_from_content
llm_response.result_chain = mc
# still let _extract_reasoning_content override if present
llm_response.reasoning_content = self._extract_reasoning_content(completion)
```
这样既保留了所有行为(多模态、文本、`<think>` 标签、不同图片格式、未知类型分支),又把复杂的分支逻辑移出了核心解析函数。
</issue_to_address>帮我变得更有用!请对每条评论点 👍 或 👎,我会根据你的反馈改进后续评审。
Original comment in English
Hey - I've found 2 issues, and left some high level feedback:
- In the new multimodal handling in
_parse_openai_completion, consider reusing any existing helpers for image/base64 parsing or MessageChain construction (if present elsewhere in the codebase) to avoid duplicating logic and to keep provider behavior consistent. - The new
Json.to_dictimplementation always wraps the content as{ "content": ... }, which changes the shape of the serialized payload for dict inputs; if any consumers rely on the previous structure ofdata, you may want to preserve the original dict form whendatawas a dict, and only use thecontentwrapper for raw string cases. - In
Reply.to_dict, you hardcode the type string as"reply"; for consistency and easier refactoring later, consider usingself.type(like other components) so changes toComponentType.Replypropagate automatically.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the new multimodal handling in `_parse_openai_completion`, consider reusing any existing helpers for image/base64 parsing or MessageChain construction (if present elsewhere in the codebase) to avoid duplicating logic and to keep provider behavior consistent.
- The new `Json.to_dict` implementation always wraps the content as `{ "content": ... }`, which changes the shape of the serialized payload for dict inputs; if any consumers rely on the previous structure of `data`, you may want to preserve the original dict form when `data` was a dict, and only use the `content` wrapper for raw string cases.
- In `Reply.to_dict`, you hardcode the type string as `"reply"`; for consistency and easier refactoring later, consider using `self.type` (like other components) so changes to `ComponentType.Reply` propagate automatically.
## Individual Comments
### Comment 1
<location> `astrbot/core/provider/sources/openai_source.py:253-262` </location>
<code_context>
+ # content can be either a plain string or a multimodal list
+ content = choice.message.content
+ # handle multimodal content returned as a list of parts
+ if isinstance(content, list):
+ reasoning_parts = []
+ mc = MessageChain()
+ for part in content:
+ if not isinstance(part, dict):
+ # fallback: append as plain text
+ mc.message(str(part))
+ continue
+ ptype = part.get("type")
+ if ptype == "text":
+ mc.message(part.get("text", ""))
+ elif ptype == "image_url":
+ image_field = part.get("image_url")
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Text parts in multimodal responses no longer strip <think> tags or populate reasoning_content, unlike the string path.
In the multimodal `list` branch, `ptype == "text"` just appends `part["text"]` as-is, while the string-only branch still extracts `<think>...</think>` into `reasoning_content` and strips it from the visible completion. This means any `<think>` in multimodal text parts will now be shown to users and handled inconsistently between the two paths. You likely want to run the same regex-based extraction on text parts here (and append matches to `reasoning_parts`) so multimodal and string responses are treated consistently.
Suggested implementation:
```python
# content can be either a plain string or a multimodal list
content = choice.message.content
# handle multimodal content returned as a list of parts
if isinstance(content, list):
reasoning_parts = []
mc = MessageChain()
for part in content:
if not isinstance(part, dict):
# fallback: append as plain text
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
text = part.get("text", "") or ""
if text:
# extract <think>...</think> segments into reasoning_parts
try:
import re # local import in case module-level import is not present
think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL)
if think_matches:
reasoning_parts.extend(think_matches)
# strip all <think> blocks from the visible completion text
text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
except Exception:
# on any failure, fall back to using the raw text
pass
mc.message(text)
elif ptype == "image_url":
image_field = part.get("image_url")
url = None
if isinstance(image_field, dict):
url = image_field.get("url")
else:
url = image_field
if url:
# data:image/...;base64,xxx
if isinstance(url, str) and "base64," in url:
base64_data = url.split("base64,", 1)[1]
mc.base64_image(base64_data)
```
1. If there is already a shared regex or helper used in the string-only path for extracting `<think>` tags (e.g. a compiled pattern or utility function), replace the inline `re.findall`/`re.sub` with that shared mechanism for consistency.
2. If `re` is already imported at the module level, you can safely remove the `import re` inside the `try` block to avoid duplicate imports and rely on the module-level import instead.
3. Ensure that elsewhere in this function, `reasoning_parts` is joined and assigned to the same `reasoning_content` (or equivalent) field that is used in the string-only path so that callers receive a unified reasoning payload.
</issue_to_address>
### Comment 2
<location> `astrbot/core/provider/sources/openai_source.py:250` </location>
<code_context>
- )
- completion_text = reasoning_pattern.sub("", completion_text).strip()
- llm_response.result_chain = MessageChain().message(completion_text)
+ # content can be either a plain string or a multimodal list
+ content = choice.message.content
+ # handle multimodal content returned as a list of parts
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the multimodal content, image handling, and `<think>` parsing into dedicated helper functions so `_parse_openai_completion` mainly orchestrates them.
You can keep the new functionality but reduce complexity by pulling the multimodal parsing and image/reasoning handling into small helpers. That makes `_parse_openai_completion` mostly orchestration and flattens the nesting.
For example, extract the multimodal handling:
```python
def _build_message_chain_from_content(self, content) -> tuple[MessageChain, str | None]:
mc = MessageChain()
reasoning_parts: list[str] = []
if isinstance(content, list):
for part in content:
if not isinstance(part, dict):
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
mc.message(part.get("text", ""))
elif ptype == "image_url":
self._append_image_part(mc, part.get("image_url"))
elif ptype == "think":
think_val = part.get("think")
if think_val:
reasoning_parts.append(str(think_val))
else:
mc.message(json.dumps(part, ensure_ascii=False))
else:
# plain string, including <think> tag handling
completion_text, reasoning = self._extract_reasoning_from_text(str(content))
mc.message(completion_text)
if reasoning:
reasoning_parts.append(reasoning)
reasoning_content = None
if reasoning_parts:
reasoning_content = "\n".join(rp.strip() for rp in reasoning_parts)
return mc, reasoning_content
```
Factor image URL handling separately to avoid nested conditionals in the main logic:
```python
def _append_image_part(self, mc: MessageChain, image_field: Any) -> None:
url = image_field.get("url") if isinstance(image_field, dict) else image_field
if not url or not isinstance(url, str):
return
if "base64," in url:
mc.base64_image(url.split("base64,", 1)[1])
elif url.startswith("base64://"):
mc.base64_image(url.replace("base64://", ""))
else:
mc.url_image(url)
```
And a small helper for the `<think>`-tag case:
```python
def _extract_reasoning_from_text(self, text: str) -> tuple[str, str | None]:
reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)
matches = reasoning_pattern.findall(text)
reasoning = None
if matches:
reasoning = "\n".join(match.strip() for match in matches)
text = reasoning_pattern.sub("", text).strip()
return text.strip(), reasoning
```
Then `_parse_openai_completion` becomes less branched:
```python
if choice.message.content is not None:
mc, reasoning_from_content = self._build_message_chain_from_content(
choice.message.content
)
if reasoning_from_content:
llm_response.reasoning_content = reasoning_from_content
llm_response.result_chain = mc
# still let _extract_reasoning_content override if present
llm_response.reasoning_content = self._extract_reasoning_content(completion)
```
This keeps all behaviors (multimodal, text, `<think>` tags, image formats, unknown parts) but moves the detailed branching out of the central parsing function.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if isinstance(content, list): | ||
| reasoning_parts = [] | ||
| mc = MessageChain() | ||
| for part in content: | ||
| if not isinstance(part, dict): | ||
| # fallback: append as plain text | ||
| mc.message(str(part)) | ||
| continue | ||
| ptype = part.get("type") | ||
| if ptype == "text": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): 多模态响应中的文本部分现在不会再去除 标签,也不会填充 reasoning_content,这与纯字符串路径的行为不一致。
在多模态的 list 分支中,ptype == "text" 时只是直接追加 part["text"],而字符串分支仍然会把 <think>...</think> 提取到 reasoning_content 中,并从可见输出中移除。这意味着多模态文本中的任何 <think> 内容现在都会直接展示给用户,而且两种路径的处理方式不一致。你很可能需要在这里对文本部分也执行相同的基于正则的提取逻辑(并把匹配结果追加到 reasoning_parts),从而让多模态和纯文本响应的处理保持一致。
建议实现如下:
# content can be either a plain string or a multimodal list
content = choice.message.content
# handle multimodal content returned as a list of parts
if isinstance(content, list):
reasoning_parts = []
mc = MessageChain()
for part in content:
if not isinstance(part, dict):
# fallback: append as plain text
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
text = part.get("text", "") or ""
if text:
# extract <think>...</think> segments into reasoning_parts
try:
import re # local import in case module-level import is not present
think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL)
if think_matches:
reasoning_parts.extend(think_matches)
# strip all <think> blocks from the visible completion text
text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
except Exception:
# on any failure, fall back to using the raw text
pass
mc.message(text)
elif ptype == "image_url":
image_field = part.get("image_url")
url = None
if isinstance(image_field, dict):
url = image_field.get("url")
else:
url = image_field
if url:
# data:image/...;base64,xxx
if isinstance(url, str) and "base64," in url:
base64_data = url.split("base64,", 1)[1]
mc.base64_image(base64_data)- 如果字符串分支中已经有用于提取
<think>标签的共享正则或辅助函数(例如预编译的 pattern 或工具方法),请在这里复用,而不是在本地使用新的re.findall/re.sub,以保持一致性。 - 如果模块顶层已经导入了
re,就可以去掉try块里的import re,直接依赖模块级导入,避免重复导入。 - 请确保在该函数的其他位置,会把
reasoning_parts合并并赋值给与字符串路径中相同的reasoning_content(或等价字段),从而让调用方获得统一的推理内容。
Original comment in English
suggestion (bug_risk): Text parts in multimodal responses no longer strip tags or populate reasoning_content, unlike the string path.
In the multimodal list branch, ptype == "text" just appends part["text"] as-is, while the string-only branch still extracts <think>...</think> into reasoning_content and strips it from the visible completion. This means any <think> in multimodal text parts will now be shown to users and handled inconsistently between the two paths. You likely want to run the same regex-based extraction on text parts here (and append matches to reasoning_parts) so multimodal and string responses are treated consistently.
Suggested implementation:
# content can be either a plain string or a multimodal list
content = choice.message.content
# handle multimodal content returned as a list of parts
if isinstance(content, list):
reasoning_parts = []
mc = MessageChain()
for part in content:
if not isinstance(part, dict):
# fallback: append as plain text
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
text = part.get("text", "") or ""
if text:
# extract <think>...</think> segments into reasoning_parts
try:
import re # local import in case module-level import is not present
think_matches = re.findall(r"<think>(.*?)</think>", text, flags=re.DOTALL)
if think_matches:
reasoning_parts.extend(think_matches)
# strip all <think> blocks from the visible completion text
text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
except Exception:
# on any failure, fall back to using the raw text
pass
mc.message(text)
elif ptype == "image_url":
image_field = part.get("image_url")
url = None
if isinstance(image_field, dict):
url = image_field.get("url")
else:
url = image_field
if url:
# data:image/...;base64,xxx
if isinstance(url, str) and "base64," in url:
base64_data = url.split("base64,", 1)[1]
mc.base64_image(base64_data)- If there is already a shared regex or helper used in the string-only path for extracting
<think>tags (e.g. a compiled pattern or utility function), replace the inlinere.findall/re.subwith that shared mechanism for consistency. - If
reis already imported at the module level, you can safely remove theimport reinside thetryblock to avoid duplicate imports and rely on the module-level import instead. - Ensure that elsewhere in this function,
reasoning_partsis joined and assigned to the samereasoning_content(or equivalent) field that is used in the string-only path so that callers receive a unified reasoning payload.
| ) | ||
| completion_text = reasoning_pattern.sub("", completion_text).strip() | ||
| llm_response.result_chain = MessageChain().message(completion_text) | ||
| # content can be either a plain string or a multimodal list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): 建议把多模态内容处理、图片处理以及 <think> 解析拆分到独立的辅助函数中,让 _parse_openai_completion 主要负责协调调用。
你可以保持当前的新功能,但通过将多模态解析以及图片/推理处理抽取为几个小的 helper 函数来降低复杂度。这样 _parse_openai_completion 就主要做流程编排,分支嵌套会明显减少。
例如,可以把多模态处理抽取出来:
def _build_message_chain_from_content(self, content) -> tuple[MessageChain, str | None]:
mc = MessageChain()
reasoning_parts: list[str] = []
if isinstance(content, list):
for part in content:
if not isinstance(part, dict):
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
mc.message(part.get("text", ""))
elif ptype == "image_url":
self._append_image_part(mc, part.get("image_url"))
elif ptype == "think":
think_val = part.get("think")
if think_val:
reasoning_parts.append(str(think_val))
else:
mc.message(json.dumps(part, ensure_ascii=False))
else:
# plain string, including <think> tag handling
completion_text, reasoning = self._extract_reasoning_from_text(str(content))
mc.message(completion_text)
if reasoning:
reasoning_parts.append(reasoning)
reasoning_content = None
if reasoning_parts:
reasoning_content = "\n".join(rp.strip() for rp in reasoning_parts)
return mc, reasoning_content将图片 URL 处理再单独封装,以避免主流程中的嵌套条件判断:
def _append_image_part(self, mc: MessageChain, image_field: Any) -> None:
url = image_field.get("url") if isinstance(image_field, dict) else image_field
if not url or not isinstance(url, str):
return
if "base64," in url:
mc.base64_image(url.split("base64,", 1)[1])
elif url.startswith("base64://"):
mc.base64_image(url.replace("base64://", ""))
else:
mc.url_image(url)再为 <think> 标签处理写一个小 helper:
def _extract_reasoning_from_text(self, text: str) -> tuple[str, str | None]:
reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)
matches = reasoning_pattern.findall(text)
reasoning = None
if matches:
reasoning = "\n".join(match.strip() for match in matches)
text = reasoning_pattern.sub("", text).strip()
return text.strip(), reasoning这样 _parse_openai_completion 就可以简化为:
if choice.message.content is not None:
mc, reasoning_from_content = self._build_message_chain_from_content(
choice.message.content
)
if reasoning_from_content:
llm_response.reasoning_content = reasoning_from_content
llm_response.result_chain = mc
# still let _extract_reasoning_content override if present
llm_response.reasoning_content = self._extract_reasoning_content(completion)这样既保留了所有行为(多模态、文本、<think> 标签、不同图片格式、未知类型分支),又把复杂的分支逻辑移出了核心解析函数。
Original comment in English
issue (complexity): Consider extracting the multimodal content, image handling, and <think> parsing into dedicated helper functions so _parse_openai_completion mainly orchestrates them.
You can keep the new functionality but reduce complexity by pulling the multimodal parsing and image/reasoning handling into small helpers. That makes _parse_openai_completion mostly orchestration and flattens the nesting.
For example, extract the multimodal handling:
def _build_message_chain_from_content(self, content) -> tuple[MessageChain, str | None]:
mc = MessageChain()
reasoning_parts: list[str] = []
if isinstance(content, list):
for part in content:
if not isinstance(part, dict):
mc.message(str(part))
continue
ptype = part.get("type")
if ptype == "text":
mc.message(part.get("text", ""))
elif ptype == "image_url":
self._append_image_part(mc, part.get("image_url"))
elif ptype == "think":
think_val = part.get("think")
if think_val:
reasoning_parts.append(str(think_val))
else:
mc.message(json.dumps(part, ensure_ascii=False))
else:
# plain string, including <think> tag handling
completion_text, reasoning = self._extract_reasoning_from_text(str(content))
mc.message(completion_text)
if reasoning:
reasoning_parts.append(reasoning)
reasoning_content = None
if reasoning_parts:
reasoning_content = "\n".join(rp.strip() for rp in reasoning_parts)
return mc, reasoning_contentFactor image URL handling separately to avoid nested conditionals in the main logic:
def _append_image_part(self, mc: MessageChain, image_field: Any) -> None:
url = image_field.get("url") if isinstance(image_field, dict) else image_field
if not url or not isinstance(url, str):
return
if "base64," in url:
mc.base64_image(url.split("base64,", 1)[1])
elif url.startswith("base64://"):
mc.base64_image(url.replace("base64://", ""))
else:
mc.url_image(url)And a small helper for the <think>-tag case:
def _extract_reasoning_from_text(self, text: str) -> tuple[str, str | None]:
reasoning_pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)
matches = reasoning_pattern.findall(text)
reasoning = None
if matches:
reasoning = "\n".join(match.strip() for match in matches)
text = reasoning_pattern.sub("", text).strip()
return text.strip(), reasoningThen _parse_openai_completion becomes less branched:
if choice.message.content is not None:
mc, reasoning_from_content = self._build_message_chain_from_content(
choice.message.content
)
if reasoning_from_content:
llm_response.reasoning_content = reasoning_from_content
llm_response.result_chain = mc
# still let _extract_reasoning_content override if present
llm_response.reasoning_content = self._extract_reasoning_content(completion)This keeps all behaviors (multimodal, text, <think> tags, image formats, unknown parts) but moves the detailed branching out of the central parsing function.
|
我去,有的地方确实要改,那我下下周或者好心人帮我吧 |
Modifications / 改动点
components.py 新增消息组件类型处理逻辑
openai_source.py 增强OpenAI API错误处理机制
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
requirements.txt和pyproject.toml文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations inrequirements.txtandpyproject.toml.Summary by Sourcery
处理多模态的 OpenAI 聊天补全内容,并避免在消息组件中出现不正确的 JSON 重新序列化问题。
新功能:
错误修复:
改进:
<think>标签提取和存储推理内容的方式。Original summary in English
Summary by Sourcery
Handle multimodal OpenAI chat completion content and avoid incorrect JSON re-serialization in message components.
New Features:
Bug Fixes:
Enhancements: