PaddlePaddle · luotao1 · Mar 17, 2026 · Mar 11, 2026 · Mar 12, 2026 · Mar 12, 2026
diff --git a/hackathon/hackathon_10th/【Hackathon_10th】硬件生态任务合集.md b/hackathon/hackathon_10th/【Hackathon_10th】硬件生态任务合集.md
@@ -29,7 +29,29 @@
 
 ### 请 沐曦 填写
 
-### 请 燧原 填写
+### 基于燧原卡为`FastDeploy`新增应用
+* 技术标签：PaddlePaddle，FastDeploy，Python
+
+* 详细描述：本任务旨在利用 燧原 S60 加速卡 (GCU) 的算力优势，结合 FastDeploy 高性能推理框架，对 ERNIE-4.5-0.3B-Paddle 模型进行二次开发与应用。我们鼓励开发者打造具有真实落地价值、逻辑闭环且体验优秀的创新案例。参考 [飞桨 AI Studio 应用案例库](https://aistudio.baidu.com/topic/applications) 。
+* 提交内容：
+    * 第一阶段：RFC 方案提交
+      1. 提交方式：1）以markdown文件的形式提交到 https://aistudio.baidu.com/projectoverview, 2）标题处打上【PaddlePaddle Hackathon 10】。
+      2. 基本要求：1）应用场景避免与现有 Demo（如简单的情感分析）重复，2）方案需充分挖掘 `ERNIE-4.5-0.3B-Paddle` 轻量且高效的特点。
+      3. 筛选依据：1）该示例在真实场景下是否具有实际应用价值，2）该示例的流程逻辑是否清晰，3）预期的推理效果与业务指标是否匹配。
+
+    * 第二阶段：PR代码提交
+      1. 提交地址：以 Notebook (ipynb) 格式提交完整代码到 https://aistudio.baidu.com/projectoverview 里自己的project项目，标题加上【PaddlePaddle Hackathon 10】字样，并在描述处链接之前的 RFC 地址
+      2. 该 PR 需满足 notebook 贡献规范，开发者需要及时根据 review 的结果进行 PR 修改
+      3. 在比赛过半时设置中期检查会，开发者需汇报项目进度、展示已完成的功能、总结当前遇到的问题与挑战、并介绍后半段比赛的计划安排
+* 参考示例：考虑到通用性，选取的应用场景尽量以英文为主，推荐方案场景有：
+   * 智能文本处理：长文摘要、垂直领域翻译。
+   * 语义理解应用：行业知识库问答、高级情感倾向挖掘。
+   * 参考Demo：
+     * [ERINE-4.5-0.3B老北京风格微调](https://aistudio.baidu.com/projectdetail/10000880?channelType=0&channel=0) 
+     * [基于ERNIE-4.5-0.3B 中文情感分析实战教程](https://aistudio.baidu.com/projectdetail/9385231)
+
+* 技术要求：熟练掌握 python 和 FastDeploy 部署流程与其他工具组件的使用方法
+* 参考文档：[FastDeploy](https://paddlepaddle.github.io/FastDeploy/zh/) 、[飞桨AI Studio](https://aistudio.baidu.com/overview)
 
 ### 请 海光 填写
 

diff --git a/pfcc/paddle-hardware/requirements-gcu.txt b/pfcc/paddle-hardware/requirements-gcu.txt
@@ -0,0 +1,58 @@
+setuptools==62.3.0
+pre-commit
+yapf
+flake8
+ruamel.yaml
+zmq
+aiozmq
+openai>=1.93.0
+tqdm
+pynvml
+uvicorn>=0.38.0
+fastapi
+paddleformers @ https://paddle-qa.bj.bcebos.com/ernie/paddleformers-0.4.0.post20251222-py3-none-any.whl
+redis
+etcd3
+httpx
+tool_helpers
+cupy-cuda12x
+pybind11[global]
+tabulate
+gradio
+xlwt
+visualdl
+setuptools-scm>=8
+prometheus-client
+decord
+moviepy
+triton==3.3
+crcmod
+msgpack
+gunicorn==25.0.3
+modelscope
+safetensors>=0.7.0
+opentelemetry-api>=1.24.0
+opentelemetry-sdk>=1.24.0
+opentelemetry-instrumentation-redis
+opentelemetry-instrumentation-mysql
+opentelemetry-distro
+opentelemetry-exporter-otlp
+opentelemetry-instrumentation-fastapi
+opentelemetry-instrumentation-logging>=0.57b0
+partial_json_parser
+msgspec
+einops
+setproctitle
+aistudio_sdk
+p2pstore
+py-cpuinfo
+flashinfer-python-paddle
+flash_mask @ https://paddle-qa.bj.bcebos.com/ernie/flash_mask-4.0.post20260128-py3-none-any.whl
+arctic_inference @ https://paddle-qa.bj.bcebos.com/ernie/arctic_inference-0.1.3-cp310-cp310-linux_x86_64.whl
+paddlefsl
+colorama
+seqeval
+paddle2onnx
+dill<0.3.5
+jieba
+onnx
diff --git a/pfcc/paddle-hardware/燧原科技-基于FastDeploy跑通ERNIE-4.5-0.3B-Paddle 打卡任务.md b/pfcc/paddle-hardware/燧原科技-基于FastDeploy跑通ERNIE-4.5-0.3B-Paddle 打卡任务.md
@@ -0,0 +1,155 @@
+# 燧原科技：基于FastDeploy跑通ERNIE-4.5-0.3B-Paddle
+
+通过亲手完成 FastDeploy 在燧原 S60 加速卡（GCU）上的部署流程，体验国产算力与飞桨（PaddlePaddle）生态的深度融合。
+
+## 🎯任务目标
+完成本次打卡后，你将掌握：
+* 硬件适配原理：理解 PaddlePaddle 与 PaddleCustomDevice（针对 GCU）的协同关系。
+* 推理框架应用：掌握 Paddle 运行时与 FastDeploy 的依赖集成。
+* 全链路部署：独立完成 ERNIE-4.5-0.3B 模型在国产算力平台上的环境搭建、模型下载及 API 调用。
+
+## 提交方式
+参与热身打卡活动并按照邮件模板格式将截图发送至 ext_paddle_oss@baidu.com + teemo.wang@enflame-tech.com + wenhao.zhang@enflame-tech.com
+
+## 算力/环境支持
+本次任务需在 Gitee AI 算力广场 租赁燧原 S60 实例完成。
+> 平台地址：[GiteeAi 算力广场](https://ai.gitee.com/compute) \
+> 镜像选择: `vLLM / 0.8.0 / Python 3.10 / ef 1.5.0.604` 
+
+## 任务指导
+
+### 创建虚拟环境
+为保证环境纯净，建议在宿主机创建独立的 Python 虚拟环境。
+```
+cd ~
+apt install python3.10-venv
+python3 -m venv .venv
+source .venv/bin/activate
+```
+
+### 安装 PaddlePaddle & PaddleCustomDevice
+```
+# PaddlePaddle『飞桨』深度学习框架，提供运算基础能力
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+
+# PaddleCustomDevice是PaddlePaddle『飞桨』深度学习框架的自定义硬件接入实现，提供GCU的算子实现
+python -m pip install paddle-custom-gcu==3.0.0.dev20250716 -i https://www.paddlepaddle.org.cn/packages/nightly/gcu/
+```
+
+#### 检查当前安装版本
+
+```
+python -c "import paddle_custom_device; paddle_custom_device.gcu.version()"
+```
+```
+version: 3.0.0.dev20260205
+commit: e3dbd3b36a0b6913fd8da10a51251e89acafaeff
+TopsPlatform: 1.5.0.601
+```
+```
+python -c "import paddle; paddle.utils.run_check()"
+```
+```
+I0310 07:41:04.107565   961 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.10/dist-packages/paddle_custom_device
+I0310 07:41:04.107585   961 init.cc:146] Try loading custom device libs from: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
+WARNING: Logging before InitGoogleLogging() is written to STDERR
+I0310 07:41:04.269114   961 runtime.cc:804] InitPlugin for backend GCU successfully.
+I0310 07:41:04.280309   961 runtime.cc:95] Backend GCU Init, get GCU count:1, current device id:0
+I0310 07:41:04.280344   961 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-gcu.so
+I0310 07:41:04.284910   961 custom_device_load.cc:78] Succeed in loading custom engine in lib: /usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-gcu.so
+I0310 07:41:04.287516   961 custom_kernel.cc:68] Succeed in loading 275 custom kernel(s) from loaded lib(s), will be used like native ones.
+I0310 07:41:04.287611   961 init.cc:158] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
+I0310 07:41:04.287631   961 init.cc:244] CustomDevice: gcu, visible devices count: 1
+Running verify PaddlePaddle program ... 
+I0310 07:41:04.597394   961 pir_interpreter.cc:1524] New Executor is Running ...
+I0310 07:41:04.598099   961 runtime.cc:133] Backend GCU init device:0
+I0310 07:41:04.617556   961 pir_interpreter.cc:1547] pir interpreter is running by multi-thread mode ...
+I0310 07:41:04.619024  1082 utils.cc:136] Kernels launch in JIT ONLY mode:false
+I0310 07:41:04.632437  1082 op_utils.cc:191] AOT kernel stream mode:async
+I0310 07:41:04.670130  1094 gcu_layout_funcs.cc:54] Enable transpose optimize:false
+PaddlePaddle works well on 1 gcu.
+PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
+I0310 07:41:04.741544   961 runtime.cc:149] Backend GCU finalize device:0
+I0310 07:41:04.741559   961 runtime.cc:101] Backend GCU Finalize
+```
+
+
+### 安装FastDeploy
+#### 安装FastDeploy依赖
+安装 FastDeploy 依赖文件 requirements-gcu.txt，选择 [requirements-gcu.txt](./requirements-gcu.txt)
+```
+python -m pip install -r requirements-gcu.txt --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simplels
+```
+
+#### 安装FastDeploy
+```
+python -m pip install fastdeploy -i https://www.paddlepaddle.org.cn/packages/stable/gcu/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simplels
+```
+
+### 下载 ERNIE-4.5-0.3B-Paddle 模型
+
+```
+huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
+```
+
+### 推理
+执行下面命令推理
+
+```
+export ENABLE_V1_KVCACHE_SCHEDULER=1
+
+# 下面这个环境变量主要是为了绕过单卡的一个小bug。
+export CUDA_VISIBLE_DEVICES=0
+
+python -m fastdeploy.entrypoints.openai.api_server        --model baidu/ERNIE-4.5-0.3B-Paddle        --port 8180        --metrics-port 8181        --engine-worker-queue-port 8182        --max-model-len 32768        --max-num-seqs 32  --num-gpu-blocks-override 4896
+```
+
+新起一个终端，使用如下命令请求模型服务
+
+```
+curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
+-H "Content-Type: application/json" \
+-d '{
+  "messages": [
+    {"role": "user", "content": "Where is Beijing?"}
+  ]
+}'
+```
+
+成功运行后，可以查看到推理结果的生成，样例如下
+```
+{"id":"chatcmpl-525a4d8f-2f65-480e-b520-f69cc73547fb","object":"chat.completion","created":1773196831,"model":"default","choices":[{"index":0,"message":{"role":"assistant","content":"北京是中国的首都，位于中国北京市，是一个历史文化名城。","reasoning_content":null,"tool_calls":null},"finish_reason":"stop"}],"usage":{"prompt_tokens":11,"total_tokens":26,"completion_tokens":15,"prompt_tokens_details":{"cached_tokens":0}}}
+```
+
+FastDeploy服务接口兼容OpenAI协议，可以通过如下Python代码发起服务请求。
+```
+import openai
+host = "0.0.0.0"
+port = "8180"
+client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")
+
+response = client.chat.completions.create(
+    model="null",
+    messages=[
+        {"role": "system", "content": "I'm a helpful AI assistant."},
+        {"role": "user", "content": "把李白的静夜思改写为现代诗"},
+    ],
+    stream=True,
+)
+for chunk in response:
+    if chunk.choices[0].delta:
+        print(chunk.choices[0].delta.content, end='')
+print('\n')
+```
+
+## ✉️ 提交与打卡
+完成上述流程后，请按以下要求提交：
+ * 打卡内容：提供一段自定义 Prompt 的推理结果截图（需包含终端输入的命令及返回的 JSON 或流式文字）。
+
+## 邮件格式
+* 标题： [飞桨黑客松第十期-燧原S60-xx任务打卡]
+* 内容：
+   * 飞桨团队你好，
+   * 【GitHub ID】：XXX
+   * 【打卡内容】：基于 FastDeploy 跑通 ERNIE-4.5-0.3B-Paddle
+   * 【打卡截图】：