Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,29 @@

### 请 沐曦 填写

### 请 燧原 填写
### 基于燧原卡为`FastDeploy`新增应用
* 技术标签:PaddlePaddle,FastDeploy,Python

* 详细描述:本任务旨在利用 燧原 S60 加速卡 (GCU) 的算力优势,结合 FastDeploy 高性能推理框架,对 ERNIE-4.5-0.3B-Paddle 模型进行二次开发与应用。我们鼓励开发者打造具有真实落地价值、逻辑闭环且体验优秀的创新案例。参考 [飞桨 AI Studio 应用案例库](https://aistudio.baidu.com/topic/applications) 。
* 提交内容:
* 第一阶段:RFC 方案提交
1. 提交方式:1)以markdown文件的形式提交到 https://aistudio.baidu.com/projectoverview, 2)标题处打上【PaddlePaddle Hackathon 10】。
2. 基本要求:1)应用场景避免与现有 Demo(如简单的情感分析)重复,2)方案需充分挖掘 `ERNIE-4.5-0.3B-Paddle` 轻量且高效的特点。
3. 筛选依据:1)该示例在真实场景下是否具有实际应用价值,2)该示例的流程逻辑是否清晰,3)预期的推理效果与业务指标是否匹配。

* 第二阶段:PR代码提交
1. 提交地址:以 Notebook (ipynb) 格式提交完整代码到 https://aistudio.baidu.com/projectoverview 里自己的project项目,标题加上【PaddlePaddle Hackathon 10】字样,并在描述处链接之前的 RFC 地址
2. 该 PR 需满足 notebook 贡献规范,开发者需要及时根据 review 的结果进行 PR 修改
3. 在比赛过半时设置中期检查会,开发者需汇报项目进度、展示已完成的功能、总结当前遇到的问题与挑战、并介绍后半段比赛的计划安排
* 参考示例:考虑到通用性,选取的应用场景尽量以英文为主,推荐方案场景有:
* 智能文本处理:长文摘要、垂直领域翻译。
* 语义理解应用:行业知识库问答、高级情感倾向挖掘。
* 参考Demo:
* [ERINE-4.5-0.3B老北京风格微调](https://aistudio.baidu.com/projectdetail/10000880?channelType=0&channel=0)
* [基于ERNIE-4.5-0.3B 中文情感分析实战教程](https://aistudio.baidu.com/projectdetail/9385231)

* 技术要求:熟练掌握 python 和 FastDeploy 部署流程与其他工具组件的使用方法
* 参考文档:[FastDeploy](https://paddlepaddle.github.io/FastDeploy/zh/) 、[飞桨AI Studio](https://aistudio.baidu.com/overview)

### 请 海光 填写

Expand Down
58 changes: 58 additions & 0 deletions pfcc/paddle-hardware/requirements-gcu.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
setuptools==62.3.0
pre-commit
yapf
flake8
ruamel.yaml
zmq
aiozmq
openai>=1.93.0
tqdm
pynvml
uvicorn>=0.38.0
fastapi
paddleformers @ https://paddle-qa.bj.bcebos.com/ernie/paddleformers-0.4.0.post20251222-py3-none-any.whl
redis
etcd3
httpx
tool_helpers
cupy-cuda12x
pybind11[global]
tabulate
gradio
xlwt
visualdl
setuptools-scm>=8
prometheus-client
decord
moviepy
triton==3.3
crcmod
msgpack
gunicorn==25.0.3
modelscope
safetensors>=0.7.0
opentelemetry-api>=1.24.0
opentelemetry-sdk>=1.24.0
opentelemetry-instrumentation-redis
opentelemetry-instrumentation-mysql
opentelemetry-distro
opentelemetry-exporter-otlp
opentelemetry-instrumentation-fastapi
opentelemetry-instrumentation-logging>=0.57b0
partial_json_parser
msgspec
einops
setproctitle
aistudio_sdk
p2pstore
py-cpuinfo
flashinfer-python-paddle
flash_mask @ https://paddle-qa.bj.bcebos.com/ernie/flash_mask-4.0.post20260128-py3-none-any.whl
arctic_inference @ https://paddle-qa.bj.bcebos.com/ernie/arctic_inference-0.1.3-cp310-cp310-linux_x86_64.whl
paddlefsl
colorama
seqeval
paddle2onnx
dill<0.3.5
jieba
onnx
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# 燧原科技:基于FastDeploy跑通ERNIE-4.5-0.3B-Paddle

通过亲手完成 FastDeploy 在燧原 S60 加速卡(GCU)上的部署流程,体验国产算力与飞桨(PaddlePaddle)生态的深度融合。

## 🎯任务目标
完成本次打卡后,你将掌握:
* 硬件适配原理:理解 PaddlePaddle 与 PaddleCustomDevice(针对 GCU)的协同关系。
* 推理框架应用:掌握 Paddle 运行时与 FastDeploy 的依赖集成。
* 全链路部署:独立完成 ERNIE-4.5-0.3B 模型在国产算力平台上的环境搭建、模型下载及 API 调用。

## 提交方式
参与热身打卡活动并按照邮件模板格式将截图发送至 ext_paddle_oss@baidu.com + teemo.wang@enflame-tech.com + wenhao.zhang@enflame-tech.com

## 算力/环境支持
本次任务需在 Gitee AI 算力广场 租赁燧原 S60 实例完成。
> 平台地址:[GiteeAi 算力广场](https://ai.gitee.com/compute) \
> 镜像选择: `vLLM / 0.8.0 / Python 3.10 / ef 1.5.0.604`

## 任务指导

### 创建虚拟环境
为保证环境纯净,建议在宿主机创建独立的 Python 虚拟环境。
```
cd ~
apt install python3.10-venv
python3 -m venv .venv
source .venv/bin/activate
```

### 安装 PaddlePaddle & PaddleCustomDevice
```
# PaddlePaddle『飞桨』深度学习框架,提供运算基础能力
python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

# PaddleCustomDevice是PaddlePaddle『飞桨』深度学习框架的自定义硬件接入实现,提供GCU的算子实现
python -m pip install paddle-custom-gcu==3.0.0.dev20250716 -i https://www.paddlepaddle.org.cn/packages/nightly/gcu/
```

#### 检查当前安装版本

```
python -c "import paddle_custom_device; paddle_custom_device.gcu.version()"
```
```
version: 3.0.0.dev20260205
commit: e3dbd3b36a0b6913fd8da10a51251e89acafaeff
TopsPlatform: 1.5.0.601
```
```
python -c "import paddle; paddle.utils.run_check()"
```
```
I0310 07:41:04.107565 961 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.10/dist-packages/paddle_custom_device
I0310 07:41:04.107585 961 init.cc:146] Try loading custom device libs from: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0310 07:41:04.269114 961 runtime.cc:804] InitPlugin for backend GCU successfully.
I0310 07:41:04.280309 961 runtime.cc:95] Backend GCU Init, get GCU count:1, current device id:0
I0310 07:41:04.280344 961 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-gcu.so
I0310 07:41:04.284910 961 custom_device_load.cc:78] Succeed in loading custom engine in lib: /usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-gcu.so
I0310 07:41:04.287516 961 custom_kernel.cc:68] Succeed in loading 275 custom kernel(s) from loaded lib(s), will be used like native ones.
I0310 07:41:04.287611 961 init.cc:158] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
I0310 07:41:04.287631 961 init.cc:244] CustomDevice: gcu, visible devices count: 1
Running verify PaddlePaddle program ...
I0310 07:41:04.597394 961 pir_interpreter.cc:1524] New Executor is Running ...
I0310 07:41:04.598099 961 runtime.cc:133] Backend GCU init device:0
I0310 07:41:04.617556 961 pir_interpreter.cc:1547] pir interpreter is running by multi-thread mode ...
I0310 07:41:04.619024 1082 utils.cc:136] Kernels launch in JIT ONLY mode:false
I0310 07:41:04.632437 1082 op_utils.cc:191] AOT kernel stream mode:async
I0310 07:41:04.670130 1094 gcu_layout_funcs.cc:54] Enable transpose optimize:false
PaddlePaddle works well on 1 gcu.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
I0310 07:41:04.741544 961 runtime.cc:149] Backend GCU finalize device:0
I0310 07:41:04.741559 961 runtime.cc:101] Backend GCU Finalize
```


### 安装FastDeploy
#### 安装FastDeploy依赖
安装 FastDeploy 依赖文件 requirements-gcu.txt,选择 [requirements-gcu.txt](./requirements-gcu.txt)
```
python -m pip install -r requirements-gcu.txt --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simplels
```

#### 安装FastDeploy
```
python -m pip install fastdeploy -i https://www.paddlepaddle.org.cn/packages/stable/gcu/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simplels
```

### 下载 ERNIE-4.5-0.3B-Paddle 模型

```
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
```

### 推理
执行下面命令推理

```
export ENABLE_V1_KVCACHE_SCHEDULER=1

# 下面这个环境变量主要是为了绕过单卡的一个小bug。
export CUDA_VISIBLE_DEVICES=0

python -m fastdeploy.entrypoints.openai.api_server --model baidu/ERNIE-4.5-0.3B-Paddle --port 8180 --metrics-port 8181 --engine-worker-queue-port 8182 --max-model-len 32768 --max-num-seqs 32 --num-gpu-blocks-override 4896
```

新起一个终端,使用如下命令请求模型服务

```
curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Where is Beijing?"}
]
}'
```

成功运行后,可以查看到推理结果的生成,样例如下
```
{"id":"chatcmpl-525a4d8f-2f65-480e-b520-f69cc73547fb","object":"chat.completion","created":1773196831,"model":"default","choices":[{"index":0,"message":{"role":"assistant","content":"北京是中国的首都,位于中国北京市,是一个历史文化名城。","reasoning_content":null,"tool_calls":null},"finish_reason":"stop"}],"usage":{"prompt_tokens":11,"total_tokens":26,"completion_tokens":15,"prompt_tokens_details":{"cached_tokens":0}}}
```

FastDeploy服务接口兼容OpenAI协议,可以通过如下Python代码发起服务请求。
```
import openai
host = "0.0.0.0"
port = "8180"
client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")

response = client.chat.completions.create(
model="null",
messages=[
{"role": "system", "content": "I'm a helpful AI assistant."},
{"role": "user", "content": "把李白的静夜思改写为现代诗"},
],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta:
print(chunk.choices[0].delta.content, end='')
print('\n')
```

## ✉️ 提交与打卡
完成上述流程后,请按以下要求提交:
* 打卡内容:提供一段自定义 Prompt 的推理结果截图(需包含终端输入的命令及返回的 JSON 或流式文字)。

## 邮件格式
* 标题: [飞桨黑客松第十期-燧原S60-xx任务打卡]
* 内容:
* 飞桨团队你好,
* 【GitHub ID】:XXX
* 【打卡内容】:基于 FastDeploy 跑通 ERNIE-4.5-0.3B-Paddle
* 【打卡截图】: