Skip to content

feat(customs): 新增在UI日志界面上显示OCR结果。#4

Merged
kqcoxn merged 1 commit into
MaaXYZ:mainfrom
gitlihang:main
Jun 4, 2026
Merged

feat(customs): 新增在UI日志界面上显示OCR结果。#4
kqcoxn merged 1 commit into
MaaXYZ:mainfrom
gitlihang:main

Conversation

@gitlihang

@gitlihang gitlihang commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary by Sourcery

添加自定义动作和日志工具,用于在 UI 日志中展示 OCR 识别结果,并可根据这些结果选择性触发后续操作。

新功能:

  • 引入 returnOCR 自定义动作,用于获取已配置识别任务的 OCR 结果,并将其记录到 UI。
  • 支持基于 OCR 结果驱动的可选点击操作,可以点击在配置的坐标位置,或点击在识别文本区域的中心。
  • 提供日志工具,为本模块中的自定义脚本统一规范控制台和文件的日志输出。

文档:

  • 新增 README 文档,说明如何配置任务,将 OCR 结果输出到 UI 日志,以及如何配置是否启用后续动作。
Original summary in English

Summary by Sourcery

Add a custom action and logging utility to display OCR recognition results in the UI log and optionally trigger follow-up actions based on those results.

New Features:

  • Introduce a returnOCR custom action that retrieves OCR results for a configured recognition task and logs them to the UI.
  • Support optional click actions driven by OCR results, either at a configured coordinate or at the recognized text region center.
  • Provide a logging utility that standardizes console and file logging output for custom scripts in this module.

Documentation:

  • Add README documentation describing how to configure tasks to output OCR results to the UI log with or without follow-up actions.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你好,我发现了 3 个问题,并给出了一些整体性的反馈:

  • returnOCR.py 使用了 from utils import logger,但之后通过 logger.info/debug/warning 调用它,好像 logger 是一个日志实例;由于 utils/__init__.py 是空的,这里的 logger 实际上会是 utils.logger 模块,所以你可能需要 from utils.logger import logger(或者在 utils/__init__.py 中重新导出它)。
  • 使用 click_target 时假定它是一个 4 元素列表([x, y, w, h]),但在索引之前没有对其长度或类型做校验,因此最好对输入做断言或防护,避免因为不符合预期的输入导致 IndexError 或类型问题。
  • action_key 既不是 "Click" 也不是空值时,代码目前会静默地继续执行;建议对未知的 action_key 值进行日志记录或处理,这样更容易排查配置错误。
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- `returnOCR.py` does `from utils import logger` but then calls `logger.info/debug/warning` as if `logger` were a logger instance; since `utils/__init__.py` is empty this `logger` will be the `utils.logger` module, so you likely want `from utils.logger import logger` (or to re-export it in `utils/__init__.py`).
- `click_target` is used assuming a 4-element list (`[x, y, w, h]`), but there's no validation of its length or type before indexing, so it would be safer to assert or guard against malformed input to avoid `IndexError`/type issues.
- When `action_key` is neither `"Click"` nor empty, the code currently just falls through silently; consider logging or handling unknown `action_key` values so configuration mistakes are easier to diagnose.

## Individual Comments

### Comment 1
<location path="Storage/customs/gitlihang/output-ocr-result-log/action/returnOCR.py" line_range="56-57" />
<code_context>
+                # 点击传入参数中的坐标位置
+                if click_target:
+                    box = click_target
+                    center_x = box[0] + box[2] // 2
+                    center_y = box[1] + box[3] // 2
+                    logger.debug(f"点击位置: ({center_x}, {center_y})")
+                    context.tasker.controller.post_click(center_x, center_y).wait()
</code_context>
<issue_to_address>
**issue (bug_risk):** Center point calculation for `click_target` is likely incorrect for `[x1, y1, x2, y2]` boxes.

With `[x1, y1, x2, y2]` boxes, `center_x = box[0] + box[2] // 2` computes `x1 + x2/2`, not the midpoint between `x1` and `x2`. If you want the rectangle center, it should be something like `(box[0] + box[2]) // 2` and `(box[1] + box[3]) // 2` (or float division for subpixel precision); otherwise the click will be biased toward the top-left.
</issue_to_address>

### Comment 2
<location path="Storage/customs/gitlihang/output-ocr-result-log/action/returnOCR.py" line_range="31" />
<code_context>
+    ) -> CustomAction.RunResult:
+        # logger.info("进入returnOCR")
+        # 解析自定义参数,并判断是否为空
+        argv_dict: dict = json.loads(argv.custom_action_param)
+        if not argv_dict:
+            return CustomAction.RunResult(success=True)
</code_context>
<issue_to_address>
**issue:** Lack of error handling for invalid `custom_action_param` JSON may cause unhandled exceptions.

If `argv.custom_action_param` is missing, empty, or contains invalid JSON, `json.loads` will raise and terminate the action. Consider catching `json.JSONDecodeError` and either defaulting to an empty dict or returning a failed `RunResult` with a clear log message, depending on the desired behavior for bad input.
</issue_to_address>

### Comment 3
<location path="Storage/customs/gitlihang/output-ocr-result-log/README.md" line_range="86" />
<code_context>
+
+## 注意事项
+
+ - 注意的值"custom_action_param"
+
+
</code_context>
<issue_to_address>
**issue (typo):** The phrase "注意的值" is a bit unclear; consider rephrasing for clarity.

`注意的值

```suggestion
- 请注意配置项 `custom_action_param` 的取值
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Original comment in English

Hey - I've found 3 issues, and left some high level feedback:

  • returnOCR.py does from utils import logger but then calls logger.info/debug/warning as if logger were a logger instance; since utils/__init__.py is empty this logger will be the utils.logger module, so you likely want from utils.logger import logger (or to re-export it in utils/__init__.py).
  • click_target is used assuming a 4-element list ([x, y, w, h]), but there's no validation of its length or type before indexing, so it would be safer to assert or guard against malformed input to avoid IndexError/type issues.
  • When action_key is neither "Click" nor empty, the code currently just falls through silently; consider logging or handling unknown action_key values so configuration mistakes are easier to diagnose.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- `returnOCR.py` does `from utils import logger` but then calls `logger.info/debug/warning` as if `logger` were a logger instance; since `utils/__init__.py` is empty this `logger` will be the `utils.logger` module, so you likely want `from utils.logger import logger` (or to re-export it in `utils/__init__.py`).
- `click_target` is used assuming a 4-element list (`[x, y, w, h]`), but there's no validation of its length or type before indexing, so it would be safer to assert or guard against malformed input to avoid `IndexError`/type issues.
- When `action_key` is neither `"Click"` nor empty, the code currently just falls through silently; consider logging or handling unknown `action_key` values so configuration mistakes are easier to diagnose.

## Individual Comments

### Comment 1
<location path="Storage/customs/gitlihang/output-ocr-result-log/action/returnOCR.py" line_range="56-57" />
<code_context>
+                # 点击传入参数中的坐标位置
+                if click_target:
+                    box = click_target
+                    center_x = box[0] + box[2] // 2
+                    center_y = box[1] + box[3] // 2
+                    logger.debug(f"点击位置: ({center_x}, {center_y})")
+                    context.tasker.controller.post_click(center_x, center_y).wait()
</code_context>
<issue_to_address>
**issue (bug_risk):** Center point calculation for `click_target` is likely incorrect for `[x1, y1, x2, y2]` boxes.

With `[x1, y1, x2, y2]` boxes, `center_x = box[0] + box[2] // 2` computes `x1 + x2/2`, not the midpoint between `x1` and `x2`. If you want the rectangle center, it should be something like `(box[0] + box[2]) // 2` and `(box[1] + box[3]) // 2` (or float division for subpixel precision); otherwise the click will be biased toward the top-left.
</issue_to_address>

### Comment 2
<location path="Storage/customs/gitlihang/output-ocr-result-log/action/returnOCR.py" line_range="31" />
<code_context>
+    ) -> CustomAction.RunResult:
+        # logger.info("进入returnOCR")
+        # 解析自定义参数,并判断是否为空
+        argv_dict: dict = json.loads(argv.custom_action_param)
+        if not argv_dict:
+            return CustomAction.RunResult(success=True)
</code_context>
<issue_to_address>
**issue:** Lack of error handling for invalid `custom_action_param` JSON may cause unhandled exceptions.

If `argv.custom_action_param` is missing, empty, or contains invalid JSON, `json.loads` will raise and terminate the action. Consider catching `json.JSONDecodeError` and either defaulting to an empty dict or returning a failed `RunResult` with a clear log message, depending on the desired behavior for bad input.
</issue_to_address>

### Comment 3
<location path="Storage/customs/gitlihang/output-ocr-result-log/README.md" line_range="86" />
<code_context>
+
+## 注意事项
+
+ - 注意的值"custom_action_param"
+
+
</code_context>
<issue_to_address>
**issue (typo):** The phrase "注意的值" is a bit unclear; consider rephrasing for clarity.

`注意的值

```suggestion
- 请注意配置项 `custom_action_param` 的取值
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +56 to +57
center_x = box[0] + box[2] // 2
center_y = box[1] + box[3] // 2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): click_target 的中心点计算对于 [x1, y1, x2, y2] 格式的框来说很可能不正确。

在使用 [x1, y1, x2, y2] 的情况下,center_x = box[0] + box[2] // 2 实际计算的是 x1 + x2/2,而不是 x1x2 的中点。如果你想要的是矩形中心,更合适的写法是 (box[0] + box[2]) // 2(box[1] + box[3]) // 2(或者使用浮点除法以获得子像素精度);否则点击位置会偏向左上角。

Original comment in English

issue (bug_risk): Center point calculation for click_target is likely incorrect for [x1, y1, x2, y2] boxes.

With [x1, y1, x2, y2] boxes, center_x = box[0] + box[2] // 2 computes x1 + x2/2, not the midpoint between x1 and x2. If you want the rectangle center, it should be something like (box[0] + box[2]) // 2 and (box[1] + box[3]) // 2 (or float division for subpixel precision); otherwise the click will be biased toward the top-left.

) -> CustomAction.RunResult:
# logger.info("进入returnOCR")
# 解析自定义参数,并判断是否为空
argv_dict: dict = json.loads(argv.custom_action_param)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: 对无效的 custom_action_param JSON 缺少错误处理,可能会导致未捕获的异常。

如果 argv.custom_action_param 缺失、为空或包含无效 JSON,json.loads 会抛出异常并中断该动作。建议捕获 json.JSONDecodeError,并根据对错误输入的期望行为,选择要么将其默认为一个空字典,要么返回一个失败的 RunResult 并记录清晰的日志信息。

Original comment in English

issue: Lack of error handling for invalid custom_action_param JSON may cause unhandled exceptions.

If argv.custom_action_param is missing, empty, or contains invalid JSON, json.loads will raise and terminate the action. Consider catching json.JSONDecodeError and either defaulting to an empty dict or returning a failed RunResult with a clear log message, depending on the desired behavior for bad input.


## 注意事项

- 注意的值"custom_action_param"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): "注意的值" 这一表述有些不清晰,建议重新表述以提高可读性。

`注意的值

Suggested change
- 注意的值"custom_action_param"
- 请注意配置项 `custom_action_param` 的取值
Original comment in English

issue (typo): The phrase "注意的值" is a bit unclear; consider rephrasing for clarity.

`注意的值

Suggested change
- 注意的值"custom_action_param"
- 请注意配置项 `custom_action_param` 的取值

@kqcoxn kqcoxn merged commit a8e30d6 into MaaXYZ:main Jun 4, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants