PD分离部署DeepSeeK-R1-FP8模型，起tp16卡的prefill服务报错

[Gloo] Rank 0 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 0
[Gloo] Rank 1 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 1
[Gloo] Rank 2 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 2
[Gloo] Rank 3 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 3
[Gloo] Rank 4 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 4
[Gloo] Rank 5 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 5
[Gloo] Rank 6 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 6
[Gloo] Rank 7 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 7
INFO 09-30 03:02:12 [manager.py:193] use req queue ChunkedPrefillQueue
INFO 09-30 03:02:14 [cache_tensor_manager.py:17] USE_GPU_TENSOR_CACHE is On
All deep_gemm operations loaded successfully!
INFO 09-30 03:02:15 [__init__.py:216] Automatically detected platform cuda.
WARNING 09-30 03:02:15 [light_utils.py:13] lightllm_kernel is not installed, you can't use the api of it.
WARNING 09-30 03:02:16 [nixl_kv_transporter.py:19] nixl is not installed, which is required for pd disagreggation!!!
Process Process-2:9:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/lightllm/lightllm/server/router/model_infer/mode_backend/continues_batch/pd_mode/prefill_node_impl/prefill_kv_move_manager.py", line 233, in _init_env
    manager = PrefillKVMoveManager(args, info_queue, mem_queues)
  File "/lightllm/lightllm/server/router/model_infer/mode_backend/continues_batch/pd_mode/prefill_node_impl/prefill_kv_move_manager.py", line 40, in __init__
    assert self.dp_world_size <= self.node_world_size
AssertionError


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PD分离部署DeepSeeK-R1-FP8模型，起tp16卡的prefill服务报错 #1074

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PD分离部署DeepSeeK-R1-FP8模型，起tp16卡的prefill服务报错 #1074

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions