Skip to content

A100 8卡 cudagrap时运行benchmark serving失败 #4321

@zhang-chenyi

Description

@zhang-chenyi
  1. a100上版本:NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7

  2. 通过以下指令安装稳定版本的paddle-gpu和fastdeploy-gpu
    python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
    python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

  3. 在a100上利用一下命令运行server 和benchmark
    nohup python -m fastdeploy.entrypoints.openai.api_server
    --model /dataset/models/squad3/baidu/ERNIE-4.5-300B-A47B-Paddle
    --metrics-port 8387
    --port 8388
    --engine-worker-queue-port 8389
    --use-cudagraph
    --tensor-parallel-size 8
    --quantization="wint8"
    --max-model-len 8192 \

console0_graph.log 2>&1 &

python benchmark_serving.py
--backend openai-chat
--model EB45T
--endpoint /v1/chat/completions
--host 0.0.0.0
--port 8388
--dataset-name EBChat
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len
--num-prompts 2000
--max-concurrency 100
--save-result > infer_log_cuda_graph.txt 2>&1 &

a. 经测试发现在不使用use-cudagraph时,benchmark_serving正常运行2000条case结束;
b.在使用use-cudagraph,一次connection reset by peer(1000+case);一次nvidia-smi 超时未响应,log超过一个小时无更新(500+);一次无报错,但是log超过一个小时无更新(500+)
  1. 安装Nightly 版本测试paddlepaddle-gpu:3.3.0.dev20250928 +fastdeploy-gpu:2.3.0.dev20250928
    a. 无报错,但是log超过一个小时无更新(500+)
    b. --max-concurrency修改为8,benchmark_serving正常运行(已运行1700+,目前正常)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions