Skip to content

【third-party】Add Claude Code Skills for PP-OCRv5 and PaddleOCR-VL#17659

Closed
Aidenwu0209 wants to merge 1 commit intoPaddlePaddle:mainfrom
Aidenwu0209:add-claude-code-skills
Closed

【third-party】Add Claude Code Skills for PP-OCRv5 and PaddleOCR-VL#17659
Aidenwu0209 wants to merge 1 commit intoPaddlePaddle:mainfrom
Aidenwu0209:add-claude-code-skills

Conversation

@Aidenwu0209
Copy link
Contributor

fix #17658

概述 / Summary

添加 skills/ 目录,为 Claude Code 提供 PP-OCRv5 和 PaddleOCR-VL 的技能定义,通过百度 AI Studio API 实现 OCR 和文档解析功能。这是对现有 mcp_server/ AI 工具集成的补充。

Adds a skills/ directory providing Claude Code skill definitions for PP-OCRv5 (text extraction) and PaddleOCR-VL (document parsing) via Baidu AI Studio APIs. Complements the existing mcp_server/ AI tooling integration.

变更内容 / Changes

  • skills/pp-ocrv5/ - PP-OCRv5 技能:CLI 脚本、SKILL 定义、API 参考文档
  • skills/paddleocr-vl/ - PaddleOCR-VL 技能:CLI 脚本、SKILL 定义、API 参考文档
  • skills/README.md / skills/README_en.md - 中英双语文档

与 MCP Server 的关系 / Relationship to MCP Server

特性 / Feature MCP Server Skills
协议 / Protocol Model Context Protocol (MCP) Claude Code Skill Protocol
客户端 / Clients Claude Desktop, VSCode 等 Claude Code CLI
架构 / Architecture 长驻服务进程 (stdio/HTTP) 直接 CLI 调用

两者互补:MCP Server 适用于各种 MCP 客户端,Skills 专为 Claude Code 命令行交互优化。

测试 / Testing

# 安装依赖
pip install -r skills/pp-ocrv5/scripts/requirements.txt
pip install -r skills/paddleocr-vl/scripts/requirements.txt

# 配置 API(需要 paddleocr.com 凭证)
python skills/pp-ocrv5/scripts/configure.py
python skills/paddleocr-vl/scripts/configure.py

# 运行冒烟测试
python skills/pp-ocrv5/scripts/smoke_test.py
python skills/paddleocr-vl/scripts/smoke_test.py

检查清单 / Checklist

  • 所有 Python 文件包含 Apache 2.0 许可证头
  • 代码通过 Black (24.10.0) 格式化
  • 代码通过 Flake8 (7.1.1) 检查
  • 中英双语文档(README.md / README_en.md)
  • 无大文件(所有文件 < 512KB)
  • 无硬编码凭证或密钥
  • 对现有代码零影响(纯新增目录)

@paddle-bot
Copy link

paddle-bot bot commented Feb 6, 2026

Thanks for your contribution!

Copy link
Member

@Bobholamovic Bobholamovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢贡献!留了一些建议

@Aidenwu0209 Aidenwu0209 force-pushed the add-claude-code-skills branch 4 times, most recently from 92fafc5 to 084c906 Compare February 6, 2026 10:55
Pillow>=10.0.0

# PDF processing
PyMuPDF>=1.23.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议用pypdfium2来规避license问题(pymupdf是copyleft的)

@@ -0,0 +1,7 @@
# PaddleOCR-VL 1.5 Dependencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处注释未对应更新。请通查整个项目,看看是否还有遗漏。


**POST** `<PADDLEOCR_VL_API_URL>`

Where the URL is obtained from [Paddle AI Studio](https://paddleocr.com) (select VL model).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在我们的文档中就不要出现 Paddle AI Studio了,可以统一提 PaddleOCR official website


Where `<ACCESS_TOKEN>` is the API token obtained from Paddle AI Studio.

## Request Body
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个接口描述不正确,需要参考API文档修正。parse_all等并不是API支持的参数。

## Best Practices

1. **Use URL for large files**: Prefer `file_url` over base64 for files >5MB
2. **Handle timeouts**: VL processing can take 3-10 seconds per page
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议只说对于大文档可能需要数分钟的处理时间

2. **Handle timeouts**: VL processing can take 3-10 seconds per page
3. **Retry on 503/504**: Use exponential backoff (up to 2 retries)
4. **Never log tokens**: Keep credentials secure
5. **Cache responses**: Results can be cached for 10 minutes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API默认没有cache,建议去掉这一条

key = key.strip()
# Skip old and new VL keys (will be overwritten)
if key not in [
"VL_API_URL",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块可以不保持后向兼容,建议直接去掉对VL_API_URLVL_TOKEN的支持,另外也去掉这个注释:“# Skip old and new VL keys (will be overwritten)”

if key not in [
"VL_API_URL",
"VL_TOKEN",
"PADDLEOCR_VL_API_URL",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议调整名字,不要绑定VL,而是和skill的名称对应,例如体现“文档解析”

"VL_API_URL",
"VL_TOKEN",
"PADDLEOCR_VL_API_URL",
"PADDLEOCR_VL_ACCESS_TOKEN",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不同任务的access token通常是一样的,这里建议可以直接叫“PADDLEOCR_ACCESS_TOKEN”

"VL_TOKEN", ""
)

print("Please provide your PaddleOCR-VL API credentials:")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要排查整个项目中所有的文档、代码,确认PaddleOCR-VLPP-OCRv5或类似字样不再单独出现,而是只体现“文档解析“或“文本识别”

@Aidenwu0209 Aidenwu0209 force-pushed the add-claude-code-skills branch from 084c906 to a6fd5d2 Compare February 9, 2026 08:48
f.write("# ========================================\n")
f.write("# PaddleOCR Document Parsing Configuration\n")
f.write("# ========================================\n")
f.write(f"PADDLEOCR_PARSING_API_URL={api_url}\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要不PADDLEOCR_DOC_PARSING_API_URL


```json
{
"file_url": "https://example.com/document.pdf"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接口还是不对,建议这块人工编写,不用AI coding工具(容易幻觉)


## Overview

This directory provides two Claude Code skills for OCR text recognition and document parsing via Baidu AI Studio APIs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议通查所有文件,去掉“Baidu AI Studio APIs”相关表述,替换为PaddleOCR offiical API

@Aidenwu0209 Aidenwu0209 force-pushed the add-claude-code-skills branch 2 times, most recently from 6cd19ac to 3e18a70 Compare February 9, 2026 10:32
…Metrics

- Change typescript code blocks to json in output_schema.md
- Remove "using PP-OCRv5" / "using PaddleOCR-VL" from directory comments
- Delete unnecessary Quality Metrics section
- Fix _extract_text() to handle real API response (array of pages with markdown.text)
- Rewrite output_schema.md to match actual PaddleOCR-VL API response structure
- Fix provider_api.md response structure documentation
- Fix SKILL.md JSON examples and block labels to match real API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aidenwu0209 Aidenwu0209 closed this Feb 9, 2026
@Aidenwu0209 Aidenwu0209 force-pushed the add-claude-code-skills branch from 3e18a70 to f773f2c Compare February 9, 2026 14:56
@Aidenwu0209
Copy link
Contributor Author

This PR has been superseded by #17690 due to branch history issues. All review feedback has been addressed in the new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

【third-party】Add Claude Code Skills for PP-OCRv5 and PaddleOCR-VL

2 participants