Skip to content

Commit 83f7661

Browse files
committed
Add PaddleOCR-VL in vllm inference document
1 parent c395e29 commit 83f7661

File tree

1 file changed

+37
-2
lines changed

1 file changed

+37
-2
lines changed

scripts/docs/vllm推理手册.md

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,8 @@
3939
- [3.4.2 client 端请求格式样例](#342-client-端请求格式样例)
4040
- [3.4.3 FP8 static quant](#343-fp8-static-quant)
4141
- [3.4.4 FP8 dynamic quant](#344-fp8-dynamic-quant)
42-
- [3.4.5 问题解答](#345-问题解答)
42+
- [3.4.5 PaddleOCR-VL模型](#345-paddleocr-vl模型)
43+
- [3.4.5 问题解答](#346-问题解答)
4344

4445
## 1.0 环境部署
4546

@@ -959,7 +960,41 @@ PT_HPU_LAZY_MODE=1 VLLM_GRAPH_RESERVED_MEM=0.5 vllm serve \
959960
--mm_processor_kwargs max_pixels=1003520,min_pixels=3136
960961
```
961962

962-
#### 3.4.5 问题解答
963+
#### 3.4.5 PaddleOCR-VL 模型
964+
**启动服务**\
965+
966+
```bash
967+
PT_HPU_LAZY_MODE=1 vllm serve \
968+
PaddlePaddle/PaddleOCR-VL \
969+
--host 0.0.0.0 \
970+
--port 8080 \
971+
--trust-remote-code \
972+
--gpu-memory-utilization 0.5 \
973+
--max-model-len 16384 \
974+
--served-model-name 'PaddleOCR-VL-0.9B'
975+
```
976+
977+
**client 端请求格式样例**\
978+
PaddleOCR-VL 模型的client端依赖于PaddleOCR pipeline, 先安装必要的paddle相关库:
979+
980+
```bash
981+
pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
982+
pip install paddlex==3.3.4
983+
pip install "paddleocr[doc-parser]"
984+
```
985+
986+
然后,使用PaddleOCR CLI 命令发送请求:
987+
988+
```bash
989+
paddleocr doc_parser \
990+
-i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png \
991+
--enable_mkldnn False \
992+
--vl_rec_backend vllm-server \
993+
--vl_rec_server_url http://127.0.0.1:8080/v1 \
994+
--save_path ./output
995+
```
996+
997+
#### 3.4.6 问题解答
963998

964999
- 如果 server 端出现获取图像音视频超时错误,可以通过设置环境变量`VLLM_IMAGE_FETCH_TIMEOUT` `VLLM_VIDEO_FETCH_TIMEOUT` `VLLM_AUDIO_FETCH_TIMEOUT` 来提高超时时间。默认为 5/30/10
9651000
- 过大的输入图像要求更多的设备内存,可以通过设置更小的参数`--gpu-memory-utilization` (默认 0.9)来解决。例如参考脚本`openai_chat_completion_client_for_multimodal.py`中的图像分辨率最高达到 7952x5304,这会导致 server 端推理出错。可以通过设置`--gpu-memory-utilization`至 0.6~0.7 来解决。

0 commit comments

Comments
 (0)