【third-party】Add Claude Code Skills for PP-OCRv5 and PaddleOCR-VL#17659
【third-party】Add Claude Code Skills for PP-OCRv5 and PaddleOCR-VL#17659Aidenwu0209 wants to merge 1 commit intoPaddlePaddle:mainfrom
Conversation
|
Thanks for your contribution! |
92fafc5 to
084c906
Compare
| Pillow>=10.0.0 | ||
|
|
||
| # PDF processing | ||
| PyMuPDF>=1.23.0 |
There was a problem hiding this comment.
建议用pypdfium2来规避license问题(pymupdf是copyleft的)
| @@ -0,0 +1,7 @@ | |||
| # PaddleOCR-VL 1.5 Dependencies | |||
There was a problem hiding this comment.
此处注释未对应更新。请通查整个项目,看看是否还有遗漏。
|
|
||
| **POST** `<PADDLEOCR_VL_API_URL>` | ||
|
|
||
| Where the URL is obtained from [Paddle AI Studio](https://paddleocr.com) (select VL model). |
There was a problem hiding this comment.
在我们的文档中就不要出现 Paddle AI Studio了,可以统一提 PaddleOCR official website
|
|
||
| Where `<ACCESS_TOKEN>` is the API token obtained from Paddle AI Studio. | ||
|
|
||
| ## Request Body |
There was a problem hiding this comment.
这个接口描述不正确,需要参考API文档修正。parse_all等并不是API支持的参数。
| ## Best Practices | ||
|
|
||
| 1. **Use URL for large files**: Prefer `file_url` over base64 for files >5MB | ||
| 2. **Handle timeouts**: VL processing can take 3-10 seconds per page |
| 2. **Handle timeouts**: VL processing can take 3-10 seconds per page | ||
| 3. **Retry on 503/504**: Use exponential backoff (up to 2 retries) | ||
| 4. **Never log tokens**: Keep credentials secure | ||
| 5. **Cache responses**: Results can be cached for 10 minutes |
| key = key.strip() | ||
| # Skip old and new VL keys (will be overwritten) | ||
| if key not in [ | ||
| "VL_API_URL", |
There was a problem hiding this comment.
这块可以不保持后向兼容,建议直接去掉对VL_API_URL和VL_TOKEN的支持,另外也去掉这个注释:“# Skip old and new VL keys (will be overwritten)”
| if key not in [ | ||
| "VL_API_URL", | ||
| "VL_TOKEN", | ||
| "PADDLEOCR_VL_API_URL", |
There was a problem hiding this comment.
建议调整名字,不要绑定VL,而是和skill的名称对应,例如体现“文档解析”
| "VL_API_URL", | ||
| "VL_TOKEN", | ||
| "PADDLEOCR_VL_API_URL", | ||
| "PADDLEOCR_VL_ACCESS_TOKEN", |
There was a problem hiding this comment.
不同任务的access token通常是一样的,这里建议可以直接叫“PADDLEOCR_ACCESS_TOKEN”
| "VL_TOKEN", "" | ||
| ) | ||
|
|
||
| print("Please provide your PaddleOCR-VL API credentials:") |
There was a problem hiding this comment.
需要排查整个项目中所有的文档、代码,确认PaddleOCR-VL、PP-OCRv5或类似字样不再单独出现,而是只体现“文档解析“或“文本识别”
084c906 to
a6fd5d2
Compare
| f.write("# ========================================\n") | ||
| f.write("# PaddleOCR Document Parsing Configuration\n") | ||
| f.write("# ========================================\n") | ||
| f.write(f"PADDLEOCR_PARSING_API_URL={api_url}\n") |
There was a problem hiding this comment.
要不PADDLEOCR_DOC_PARSING_API_URL吧
|
|
||
| ```json | ||
| { | ||
| "file_url": "https://example.com/document.pdf" |
There was a problem hiding this comment.
接口还是不对,建议这块人工编写,不用AI coding工具(容易幻觉)
skills/README_en.md
Outdated
|
|
||
| ## Overview | ||
|
|
||
| This directory provides two Claude Code skills for OCR text recognition and document parsing via Baidu AI Studio APIs. |
There was a problem hiding this comment.
建议通查所有文件,去掉“Baidu AI Studio APIs”相关表述,替换为PaddleOCR offiical API
6cd19ac to
3e18a70
Compare
…Metrics - Change typescript code blocks to json in output_schema.md - Remove "using PP-OCRv5" / "using PaddleOCR-VL" from directory comments - Delete unnecessary Quality Metrics section - Fix _extract_text() to handle real API response (array of pages with markdown.text) - Rewrite output_schema.md to match actual PaddleOCR-VL API response structure - Fix provider_api.md response structure documentation - Fix SKILL.md JSON examples and block labels to match real API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3e18a70 to
f773f2c
Compare
|
This PR has been superseded by #17690 due to branch history issues. All review feedback has been addressed in the new PR. |
fix #17658
概述 / Summary
添加
skills/目录,为 Claude Code 提供 PP-OCRv5 和 PaddleOCR-VL 的技能定义,通过百度 AI Studio API 实现 OCR 和文档解析功能。这是对现有mcp_server/AI 工具集成的补充。Adds a
skills/directory providing Claude Code skill definitions for PP-OCRv5 (text extraction) and PaddleOCR-VL (document parsing) via Baidu AI Studio APIs. Complements the existingmcp_server/AI tooling integration.变更内容 / Changes
skills/pp-ocrv5/- PP-OCRv5 技能:CLI 脚本、SKILL 定义、API 参考文档skills/paddleocr-vl/- PaddleOCR-VL 技能:CLI 脚本、SKILL 定义、API 参考文档skills/README.md/skills/README_en.md- 中英双语文档与 MCP Server 的关系 / Relationship to MCP Server
两者互补:MCP Server 适用于各种 MCP 客户端,Skills 专为 Claude Code 命令行交互优化。
测试 / Testing
检查清单 / Checklist