Skip to content

Conversation

@KBVsent
Copy link
Contributor

@KBVsent KBVsent commented Jan 12, 2026

When using Anthropic as the LLM provider, processing quoted images in messages fails with error:

Error code: 400 - {'error': {'type': 'invalid_request_error', 'message': 'messages.0.content.1.image.source.base64.data: Image does not match the provided media type image/jpeg'}}

This occurs because mimetypes.guess_type() determines MIME type from file extension, but save_temp_img() always saves downloaded images with .jpg extension regardless of actual format. When a PNG image is saved as xxx.jpg, the code incorrectly reports image/jpeg to Anthropic API, causing validation failure.

Modifications / 改动点

Modified: anthropic_source.py

  • Removed mimetypes.guess_type() dependency
  • Added _detect_image_mime_type() method to detect MIME type from magic bytes (file signatures)
  • Updated resolve_image_url() to use correctly detected MIME type
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
  • 👀 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
  • 🤓 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到了 requirements.txtpyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
  • 😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

由 Sourcery 提供的总结

确保发送给 Anthropic 的图像消息使用从实际图像数据推断出的正确 MIME 类型,而不是依赖文件名。

错误修复:

  • 通过读取图像的 magic bytes 来检测类型(适用于文件和 base64 两种来源),修复发送给 Anthropic 的图像 MIME 类型不正确的问题。

功能增强:

  • 新增一个工具,用于从二进制数据中检测图像 MIME 类型,并在图像 base64 编码以及消息负载构造的整个流程中传递该信息。
Original summary in English

Summary by Sourcery

Ensure Anthropic image messages use the correct MIME type inferred from actual image data rather than file names.

Bug Fixes:

  • Fix incorrect image MIME type being sent to Anthropic by detecting type from image magic bytes for both file and base64 sources.

Enhancements:

  • Add utility to detect image MIME type from binary data and propagate it through image base64 encoding and message payload construction.

Copilot AI review requested due to automatic review settings January 12, 2026 03:51
@auto-assign auto-assign bot requested review from Fridemn and anka-afk January 12, 2026 03:51
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 12, 2026
@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Jan 12, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 3 个问题,并给出了一些整体性的反馈:

  • encode_image_bs64 中,with open(...) 代码块之后的那个 return "", "image/jpeg" 基本上是不可达的,除非发生异常;建议把读取文件的逻辑包在 try/except 里,这样在文件访问失败时可以记录日志并返回一个明确的兜底值,而不是依赖当前这种隐式行为。
  • encode_image_bs64 中包裹在 base64.b64decode 外面的宽泛 except Exception 可以收窄到更具体的解码相关异常(例如 binascii.ErrorValueError),避免无关错误被静默地当作 MIME 检测失败来处理。
给 AI Agent 的提示
请根据这次代码评审中的评论进行修改:

## Overall Comments
-`encode_image_bs64` 中,`with open(...)` 代码块之后的那个 `return "", "image/jpeg"` 基本上是不可达的,除非发生异常;建议把读取文件的逻辑包在 try/except 里,这样在文件访问失败时可以记录日志并返回一个明确的兜底值,而不是依赖当前这种隐式行为。
- `encode_image_bs64` 中包裹在 `base64.b64decode` 外面的宽泛 `except Exception` 可以收窄到更具体的解码相关异常(例如 `binascii.Error``ValueError`),避免无关错误被静默地当作 MIME 检测失败来处理。

## Individual Comments

### Comment 1
<location> `astrbot/core/provider/sources/anthropic_source.py:460-470` </location>
<code_context>
         async for llm_response in self._query_stream(payloads, func_tool):
             yield llm_response

+    def _detect_image_mime_type(self, data: bytes) -> str:
+        """根据图片二进制数据的 magic bytes 检测 MIME 类型"""
+        if data[:8] == b"\x89PNG\r\n\x1a\n":
+            return "image/png"
+        if data[:2] == b"\xff\xd8":
+            return "image/jpeg"
+        if data[:6] in (b"GIF87a", b"GIF89a"):
+            return "image/gif"
</code_context>

<issue_to_address>
**suggestion (bug_risk):** 建议使用更健壮的 MIME 检测策略,或者至少对不支持的格式进行明确的文档说明/显式处理。

当前逻辑只识别 PNG/JPEG/GIF/WEBP,并将其他格式(例如 BMP、TIFF、HEIC)都当作 `image/jpeg` 处理,这会导致下游消费者在尝试按 JPEG 解码非 JPEG 数据时出错。可以考虑:要么扩展检测逻辑去覆盖你预期会出现的格式;要么对未知文件头返回一个通用类型(例如 `application/octet-stream`);或者明确约束调用方只能传入这四种格式,并进行相应校验。

```suggestion
    def _detect_image_mime_type(self, data: bytes) -> str:
        """根据图片二进制数据的 magic bytes 检测 MIME 类型。

        目前仅支持 PNG / JPEG / GIF / WEBP 四种格式。
        对于无法识别或不支持的格式,返回 `application/octet-stream`,避免误将其他格式当作 JPEG 处理。
        """
        if not data:
            return "application/octet-stream"

        # PNG
        if data[:8] == b"\x89PNG\r\n\x1a\n":
            return "image/png"

        # JPEG
        if data[:2] == b"\xff\xd8":
            return "image/jpeg"

        # GIF
        if data[:6] in (b"GIF87a", b"GIF89a"):
            return "image/gif"

        # WEBP (RIFF container + WEBP signature)
        if data[:4] == b"RIFF" and data[8:12] == b"WEBP":
            return "image/webp"

        # 未知或不支持的格式,使用通用二进制类型,避免错误解码
        return "application/octet-stream"
```
</issue_to_address>

### Comment 2
<location> `astrbot/core/provider/sources/anthropic_source.py:554-559` </location>
<code_context>
-        """将图片转换为 base64"""
+    async def encode_image_bs64(self, image_url: str) -> tuple[str, str]:
+        """将图片转换为 base64,同时检测实际 MIME 类型"""
         if image_url.startswith("base64://"):
-            return image_url.replace("base64://", "data:image/jpeg;base64,")
+            raw_base64 = image_url.replace("base64://", "")
+            try:
+                image_bytes = base64.b64decode(raw_base64)
+                mime_type = self._detect_image_mime_type(image_bytes)
+            except Exception:
+                mime_type = "image/jpeg"
+            return f"data:{mime_type};base64,{raw_base64}", mime_type
         with open(image_url, "rb") as f:
-            image_bs64 = base64.b64encode(f.read()).decode("utf-8")
</code_context>

<issue_to_address>
**suggestion (bug_risk):** 用于 MIME 检测的 Base64 解码可以使用 `validate=True`,以更可靠地捕获非法输入。

在没有使用 `validate=True` 时,`base64.b64decode(raw_base64)` 可能会悄悄接受不合法的输入并生成无意义的字节数据,随后 `_detect_image_mime_type` 可能会对这些数据做出错误判断。使用 `validate=True` 会在数据非法时抛出异常,从而走到你当前的 `except` 分支,实现更可预期的行为。

```suggestion
            raw_base64 = image_url.replace("base64://", "")
            try:
                # 使用 validate=True 以便在遇到非法 base64 数据时抛出异常
                image_bytes = base64.b64decode(raw_base64, validate=True)
                mime_type = self._detect_image_mime_type(image_bytes)
            except Exception:
                mime_type = "image/jpeg"
```
</issue_to_address>

### Comment 3
<location> `astrbot/core/provider/sources/anthropic_source.py:555-558` </location>
<code_context>
         if image_url.startswith("base64://"):
-            return image_url.replace("base64://", "data:image/jpeg;base64,")
+            raw_base64 = image_url.replace("base64://", "")
+            try:
+                image_bytes = base64.b64decode(raw_base64)
+                mime_type = self._detect_image_mime_type(image_bytes)
+            except Exception:
+                mime_type = "image/jpeg"
+            return f"data:{mime_type};base64,{raw_base64}", mime_type
</code_context>

<issue_to_address>
**suggestion (bug_risk):** 在这里捕获宽泛的 `Exception` 可能会隐藏一些更适合做定向处理的解码问题。

如果唯一预期的失败场景是无效的 base64 数据,建议只捕获 `binascii.Error`(以及可能的 `ValueError`),而不是 `Exception`。这样可以保留回退到 `image/jpeg` 的逻辑,同时避免掩盖现在或未来可能在 `_detect_image_mime_type` 中出现的无关 bug。

建议实现如下:

```python
            try:
                image_bytes = base64.b64decode(raw_base64)
                mime_type = self._detect_image_mime_type(image_bytes)
            except (binascii.Error, ValueError):
                # Fallback to a sensible default if the base64 is invalid or otherwise cannot be decoded
                mime_type = "image/jpeg"

```

你还需要确保在 `astrbot/core/provider/sources/anthropic_source.py` 文件顶部引入 `binascii`,例如:

<<<<<<< SEARCH
import base64
=======
import base64
import binascii
>>>>>>> REPLACE

如果文件中已经有一组包含 `base64` 的 import,请按照现有的 import 风格,把 `import binascii` 加在它们旁边。
</issue_to_address>

Sourcery 对开源项目免费使用——如果你觉得这些评审对你有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点一下 👍 或 👎,我会根据你的反馈持续改进评审质量。
Original comment in English

Hey - I've found 3 issues, and left some high level feedback:

  • In encode_image_bs64, the final return "", "image/jpeg" after the with open(...) block is effectively unreachable unless an exception occurs; consider wrapping the file read in a try/except so you can log and return a clear fallback when file access fails instead of relying on implicit behavior.
  • The broad except Exception around base64.b64decode in encode_image_bs64 could be narrowed to specific decoding-related exceptions (e.g., binascii.Error, ValueError) so that unrelated errors are not silently treated as a MIME-detection failure.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `encode_image_bs64`, the final `return "", "image/jpeg"` after the `with open(...)` block is effectively unreachable unless an exception occurs; consider wrapping the file read in a try/except so you can log and return a clear fallback when file access fails instead of relying on implicit behavior.
- The broad `except Exception` around `base64.b64decode` in `encode_image_bs64` could be narrowed to specific decoding-related exceptions (e.g., `binascii.Error`, `ValueError`) so that unrelated errors are not silently treated as a MIME-detection failure.

## Individual Comments

### Comment 1
<location> `astrbot/core/provider/sources/anthropic_source.py:460-470` </location>
<code_context>
         async for llm_response in self._query_stream(payloads, func_tool):
             yield llm_response

+    def _detect_image_mime_type(self, data: bytes) -> str:
+        """根据图片二进制数据的 magic bytes 检测 MIME 类型"""
+        if data[:8] == b"\x89PNG\r\n\x1a\n":
+            return "image/png"
+        if data[:2] == b"\xff\xd8":
+            return "image/jpeg"
+        if data[:6] in (b"GIF87a", b"GIF89a"):
+            return "image/gif"
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider using a more robust MIME detection strategy or at least documenting/handling non-supported formats explicitly.

This logic only recognizes PNG/JPEG/GIF/WEBP and treats everything else (e.g., BMP, TIFF, HEIC) as `image/jpeg`, which can break downstream consumers that try to decode non-JPEG data as JPEG. Consider either extending detection to the formats you expect, returning a generic type (e.g., `application/octet-stream`) for unknown signatures, or enforcing that callers only pass these four formats and validating accordingly.

```suggestion
    def _detect_image_mime_type(self, data: bytes) -> str:
        """根据图片二进制数据的 magic bytes 检测 MIME 类型。

        目前仅支持 PNG / JPEG / GIF / WEBP 四种格式。
        对于无法识别或不支持的格式,返回 `application/octet-stream`,避免误将其他格式当作 JPEG 处理。
        """
        if not data:
            return "application/octet-stream"

        # PNG
        if data[:8] == b"\x89PNG\r\n\x1a\n":
            return "image/png"

        # JPEG
        if data[:2] == b"\xff\xd8":
            return "image/jpeg"

        # GIF
        if data[:6] in (b"GIF87a", b"GIF89a"):
            return "image/gif"

        # WEBP (RIFF container + WEBP signature)
        if data[:4] == b"RIFF" and data[8:12] == b"WEBP":
            return "image/webp"

        # 未知或不支持的格式,使用通用二进制类型,避免错误解码
        return "application/octet-stream"
```
</issue_to_address>

### Comment 2
<location> `astrbot/core/provider/sources/anthropic_source.py:554-559` </location>
<code_context>
-        """将图片转换为 base64"""
+    async def encode_image_bs64(self, image_url: str) -> tuple[str, str]:
+        """将图片转换为 base64,同时检测实际 MIME 类型"""
         if image_url.startswith("base64://"):
-            return image_url.replace("base64://", "data:image/jpeg;base64,")
+            raw_base64 = image_url.replace("base64://", "")
+            try:
+                image_bytes = base64.b64decode(raw_base64)
+                mime_type = self._detect_image_mime_type(image_bytes)
+            except Exception:
+                mime_type = "image/jpeg"
+            return f"data:{mime_type};base64,{raw_base64}", mime_type
         with open(image_url, "rb") as f:
-            image_bs64 = base64.b64encode(f.read()).decode("utf-8")
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Base64 decode for MIME detection could use `validate=True` to catch malformed inputs more reliably.

Without `validate=True`, `base64.b64decode(raw_base64)` may silently accept malformed input and produce garbage bytes, which `_detect_image_mime_type` might then misclassify. Using `validate=True` would instead raise on invalid data and cleanly route those cases through your existing `except` branch for more predictable behavior.

```suggestion
            raw_base64 = image_url.replace("base64://", "")
            try:
                # 使用 validate=True 以便在遇到非法 base64 数据时抛出异常
                image_bytes = base64.b64decode(raw_base64, validate=True)
                mime_type = self._detect_image_mime_type(image_bytes)
            except Exception:
                mime_type = "image/jpeg"
```
</issue_to_address>

### Comment 3
<location> `astrbot/core/provider/sources/anthropic_source.py:555-558` </location>
<code_context>
         if image_url.startswith("base64://"):
-            return image_url.replace("base64://", "data:image/jpeg;base64,")
+            raw_base64 = image_url.replace("base64://", "")
+            try:
+                image_bytes = base64.b64decode(raw_base64)
+                mime_type = self._detect_image_mime_type(image_bytes)
+            except Exception:
+                mime_type = "image/jpeg"
+            return f"data:{mime_type};base64,{raw_base64}", mime_type
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Catching a broad `Exception` here may hide specific decoding issues that could benefit from more targeted handling.

If the only expected failure is invalid base64, consider catching `binascii.Error` (and maybe `ValueError`) instead of `Exception`. That way you keep the fallback to `image/jpeg` but avoid masking unrelated bugs that might occur inside `_detect_image_mime_type` now or in the future.

Suggested implementation:

```python
            try:
                image_bytes = base64.b64decode(raw_base64)
                mime_type = self._detect_image_mime_type(image_bytes)
            except (binascii.Error, ValueError):
                # Fallback to a sensible default if the base64 is invalid or otherwise cannot be decoded
                mime_type = "image/jpeg"

```

You will also need to ensure that `binascii` is imported at the top of `astrbot/core/provider/sources/anthropic_source.py`, for example:

<<<<<<< SEARCH
import base64
=======
import base64
import binascii
>>>>>>> REPLACE

If the file already has a group of imports including `base64`, add `import binascii` alongside them using the existing import style.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +554 to +559
raw_base64 = image_url.replace("base64://", "")
try:
image_bytes = base64.b64decode(raw_base64)
mime_type = self._detect_image_mime_type(image_bytes)
except Exception:
mime_type = "image/jpeg"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): 用于 MIME 检测的 Base64 解码可以使用 validate=True,以更可靠地捕获非法输入。

在没有使用 validate=True 时,base64.b64decode(raw_base64) 可能会悄悄接受不合法的输入并生成无意义的字节数据,随后 _detect_image_mime_type 可能会对这些数据做出错误判断。使用 validate=True 会在数据非法时抛出异常,从而走到你当前的 except 分支,实现更可预期的行为。

Suggested change
raw_base64 = image_url.replace("base64://", "")
try:
image_bytes = base64.b64decode(raw_base64)
mime_type = self._detect_image_mime_type(image_bytes)
except Exception:
mime_type = "image/jpeg"
raw_base64 = image_url.replace("base64://", "")
try:
# 使用 validate=True 以便在遇到非法 base64 数据时抛出异常
image_bytes = base64.b64decode(raw_base64, validate=True)
mime_type = self._detect_image_mime_type(image_bytes)
except Exception:
mime_type = "image/jpeg"
Original comment in English

suggestion (bug_risk): Base64 decode for MIME detection could use validate=True to catch malformed inputs more reliably.

Without validate=True, base64.b64decode(raw_base64) may silently accept malformed input and produce garbage bytes, which _detect_image_mime_type might then misclassify. Using validate=True would instead raise on invalid data and cleanly route those cases through your existing except branch for more predictable behavior.

Suggested change
raw_base64 = image_url.replace("base64://", "")
try:
image_bytes = base64.b64decode(raw_base64)
mime_type = self._detect_image_mime_type(image_bytes)
except Exception:
mime_type = "image/jpeg"
raw_base64 = image_url.replace("base64://", "")
try:
# 使用 validate=True 以便在遇到非法 base64 数据时抛出异常
image_bytes = base64.b64decode(raw_base64, validate=True)
mime_type = self._detect_image_mime_type(image_bytes)
except Exception:
mime_type = "image/jpeg"

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes an issue where the Anthropic API rejects images due to MIME type mismatches. The problem occurs when images are downloaded and saved with incorrect file extensions (e.g., PNG saved as .jpg), causing mimetypes.guess_type() to report the wrong MIME type based on the extension rather than actual content.

Changes:

  • Implemented magic byte-based MIME type detection to identify image formats from binary data rather than file extensions
  • Updated encode_image_bs64() to return both base64 data and detected MIME type as a tuple
  • Removed dependency on mimetypes.guess_type() module

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 12, 2026
@Soulter
Copy link
Member

Soulter commented Jan 12, 2026

helpful changes, thanks!

@Soulter Soulter merged commit eedb62a into AstrBotDevs:master Jan 12, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants