A100 8卡  cudagrap时运行benchmark serving失败

1. a100上版本：NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7  

2. 通过以下指令安装稳定版本的paddle-gpu和fastdeploy-gpu   
    python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
    python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

3. 在a100上利用一下命令运行server 和benchmark
nohup python -m fastdeploy.entrypoints.openai.api_server \
  --model /dataset/models/squad3/baidu/ERNIE-4.5-300B-A47B-Paddle \
  --metrics-port 8387 \
  --port 8388 \
  --engine-worker-queue-port 8389 \
  --use-cudagraph \
  --tensor-parallel-size 8 \
  --quantization="wint8" \
 --max-model-len 8192 \
  > console0_graph.log 2>&1 &

python benchmark_serving.py \
  --backend openai-chat \
  --model EB45T \
  --endpoint /v1/chat/completions \
  --host 0.0.0.0 \
  --port 8388 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --num-prompts 2000 \
  --max-concurrency 100 \
  --save-result > infer_log_cuda_graph.txt 2>&1 &

    a. 经测试发现在不使用use-cudagraph时，benchmark_serving正常运行2000条case结束；
    b.在使用use-cudagraph，一次connection reset by peer（1000+case）；一次nvidia-smi 超时未响应，log超过一个小时无更新（500+）；一次无报错，但是log超过一个小时无更新（500+）

4. 安装Nightly 版本测试paddlepaddle-gpu：3.3.0.dev20250928 +fastdeploy-gpu：2.3.0.dev20250928
    a. 无报错，但是log超过一个小时无更新（500+）
    b.   --max-concurrency修改为8，benchmark_serving正常运行（已运行1700+,目前正常）




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A100 8卡 cudagrap时运行benchmark serving失败 #4321

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A100 8卡 cudagrap时运行benchmark serving失败 #4321

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions