-
Notifications
You must be signed in to change notification settings - Fork 634
Description
-
a100上版本:NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7
-
通过以下指令安装稳定版本的paddle-gpu和fastdeploy-gpu
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple -
在a100上利用一下命令运行server 和benchmark
nohup python -m fastdeploy.entrypoints.openai.api_server
--model /dataset/models/squad3/baidu/ERNIE-4.5-300B-A47B-Paddle
--metrics-port 8387
--port 8388
--engine-worker-queue-port 8389
--use-cudagraph
--tensor-parallel-size 8
--quantization="wint8"
--max-model-len 8192 \
console0_graph.log 2>&1 &
python benchmark_serving.py
--backend openai-chat
--model EB45T
--endpoint /v1/chat/completions
--host 0.0.0.0
--port 8388
--dataset-name EBChat
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len
--num-prompts 2000
--max-concurrency 100
--save-result > infer_log_cuda_graph.txt 2>&1 &
a. 经测试发现在不使用use-cudagraph时,benchmark_serving正常运行2000条case结束;
b.在使用use-cudagraph,一次connection reset by peer(1000+case);一次nvidia-smi 超时未响应,log超过一个小时无更新(500+);一次无报错,但是log超过一个小时无更新(500+)
- 安装Nightly 版本测试paddlepaddle-gpu:3.3.0.dev20250928 +fastdeploy-gpu:2.3.0.dev20250928
a. 无报错,但是log超过一个小时无更新(500+)
b. --max-concurrency修改为8,benchmark_serving正常运行(已运行1700+,目前正常)