Skip to content

在沐曦C500上部署InternVL2.5-26B模型失败 #4131

@Itsanewday

Description

@Itsanewday

参考
https://lmdeploy.readthedocs.io/en/latest/get_started/maca/get_started.html
下载提供的lmdeploy镜像,并采用docker compose 部署. 文件如下

version: "3.8"

x-common: &common
pull_policy: always # always, never, missing, build
restart: unless-stopped
stop_signal: SIGINT
stop_grace_period: 1m
logging:
driver: "json-file"
options:
max-file: "10"
max-size: "100m"

services:
lmdeploy-internvl25-26B:
image: localhost:5000/lmdeploy:maca
container_name: lmdeploy-internvl25-26B
shm_size: 100gb
environment:
- CUDA_VISIBLE_DEVICES=4,5
- GLOO_SOCKET_IFNAME=lo
devices:
- "/dev/dri:/dev/dri"
- "/dev/mxcd:/dev/mxcd"
- "/dev/infiniband:/dev/infiniband"
group_add:
- "video"
volumes:
- /tmp:/tmp
- /mnt/data0/models:/models/
ports:
- 20003:23333
entrypoint: ["/bin/bash", "-c"]
command:
[
"lmdeploy serve api_server --backend pytorch --device cuda --cache-block-seq-len 16 /models/OpenGVLab/InternVL2_5-26B --model-name internvl2 --tp 2 --cache-max-entry-count 0.9 "
]
<<: *common

但部署失败,日志显示如下:
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=

tp=1时可以部署,不过推理时显存不足

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions