-
Notifications
You must be signed in to change notification settings - Fork 284
Description
[Gloo] Rank 0 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 0
[Gloo] Rank 1 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 1
[Gloo] Rank 2 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 2
[Gloo] Rank 3 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 3
[Gloo] Rank 4 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 4
[Gloo] Rank 5 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 5
[Gloo] Rank 6 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 6
[Gloo] Rank 7 is connected to 15 peer ranks. Expected number of connected peer ranks is : 15
INFO 09-30 03:02:12 [prefill_impl.py:33] lock_nccl_group ranks 7
INFO 09-30 03:02:12 [manager.py:193] use req queue ChunkedPrefillQueue
INFO 09-30 03:02:14 [cache_tensor_manager.py:17] USE_GPU_TENSOR_CACHE is On
All deep_gemm operations loaded successfully!
INFO 09-30 03:02:15 [init.py:216] Automatically detected platform cuda.
WARNING 09-30 03:02:15 [light_utils.py:13] lightllm_kernel is not installed, you can't use the api of it.
WARNING 09-30 03:02:16 [nixl_kv_transporter.py:19] nixl is not installed, which is required for pd disagreggation!!!
Process Process-2:9:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/lightllm/lightllm/server/router/model_infer/mode_backend/continues_batch/pd_mode/prefill_node_impl/prefill_kv_move_manager.py", line 233, in _init_env
manager = PrefillKVMoveManager(args, info_queue, mem_queues)
File "/lightllm/lightllm/server/router/model_infer/mode_backend/continues_batch/pd_mode/prefill_node_impl/prefill_kv_move_manager.py", line 40, in init
assert self.dp_world_size <= self.node_world_size
AssertionError