pd with nixl backend (rebase main) #1002

kingder · 2025-08-07T08:47:30Z

No description provided.

gemini-code-assist

Summary of Changes

Hello @kingder, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I'm introducing a new nixl backend to LightLLM, which significantly enhances our distributed inference capabilities. This change allows for efficient KV cache transfer between prefill and decode nodes using the nixl library, addressing performance bottlenecks in large-scale model serving. It integrates nixl into our existing P/D split architecture, providing a specialized and optimized path for distributed operations.

Highlights

Nixl Backend Integration: New nixl_prefill and nixl_decode run modes have been introduced, enabling distributed KV cache management and transfer via the nixl library for enhanced performance.
Paged KV Cache Movement: A paged KV cache mechanism has been implemented for nixl modes, allowing for efficient, chunked transfer of KV states between prefill and decode nodes.
Dedicated Communication Channels: Dedicated ZMQ-based communication channels and ports have been added for nixl prefill and meta-exchange, ensuring robust and high-throughput inter-node communication.
Docker Support: New Dockerfiles (Dockerfile.nixl, Dockerfile.nixl.deepep) are provided to simplify the setup of environments with nixl, openucx, nvshmem, and DeepEP.
Enhanced Request Management: Existing request objects (PDNIXLChunkedPrefillReq) and command types (NIXLRemotePrefillDoneCmd) have been extended to manage nixl-specific distributed states and remote prefill completion signals.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new backend for prefill/decode (PD) separation using the nixl library, which appears to be a communication library for high-performance networking like RDMA. The changes are extensive, adding new Dockerfiles, run modes, and a significant amount of logic for handling remote prefill and KV cache transport. While the overall direction is clear, there are several areas that need attention, particularly concerning Docker best practices, code maintainability, and correctness in the new asynchronous logic. I've identified some potential bugs, such as using a blocking time.sleep in an async function and incomplete state update logic, which are critical to address. There are also opportunities to improve the Dockerfiles for better security and reproducibility.

gemini-code-assist · 2025-08-07T08:50:56Z

lightllm/server/router/model_infer/mode_backend/pd_nixl/impl_for_pd_base.py

+                del self.remote_prefill_requests[req_id]
+                del self.inflght_transfer_requests[req_id]
+
+            time.sleep(PDNIXLBackendBase._THREAD_WAIT_INTERVAL)


time.sleep() is a blocking call and should not be used in an async function as it will block the entire event loop. This can severely degrade performance.

Suggested change

time.sleep(PDNIXLBackendBase._THREAD_WAIT_INTERVAL)

await asyncio.sleep(PDNIXLBackendBase._THREAD_WAIT_INTERVAL)

gemini-code-assist · 2025-08-07T08:50:56Z

lightllm/server/router/model_infer/mode_backend/pd_nixl/impl_for_pd_base.py

+        #(TODO) figure out how to update req_to_next_token_ids
+        # req.cur_output_len += 1
+
+        # pack = InferReqUpdatePack(req, req.cur_output_len)
+        # pack.handle(
+        #     token_id,
+        #     token_logprob,
+        #     eos_ids=self.eos_id,
+        #     extra_post_req_handle_func=None,
+        #     is_master_in_dp=self.is_master_in_dp,
+        #     call_post_handle_for_chunk=False
+        # )
+        return token_id
+
+    def _decode_filter_reqs(
+        self, prefill_reqs: List[InferReq], decode_reqs: List[InferReq]
+    ):
+        new_prefill_reqs: List[InferReq] = []
+        remote_prefill_reqs: List[InferReq] = []
+        failed_prefill_reqs: List[InferReq] = []
+        next_token_ids: List[int] = []
+        rpd_reqs: List[InferReq] = []
+
+        for req in prefill_reqs:
+            if req.in_prefill_or_transfer:
+                if req.infer_nixl_rpd:
+                    shm_req: PDNIXLChunkedPrefillReq = req.shm_req
+                    # state is updated by router
+                    state = shm_req.get_pd_req_state()
+                    if state == RemoteTransferStatusType.SUCCESS.value:  # success
+                        next_token_ids.append(self._post_remote_prefill(req))
+                        rpd_reqs.append(req)
+                    elif state == RemoteTransferStatusType.FAILED.value:
+                        self._post_remote_prefill(req, False)
+                        failed_prefill_reqs.append(req)
+                    else:
+                        logger.warning(f"remote prefill request {shm_req.group_req_id} unexpected state {state}")
+                else:
+                    remote_prefill_reqs.append(req)
+            else:
+                new_prefill_reqs.append(req)
+
+        if rpd_reqs:
+            # g_infer_context.req_manager.req_sampling_params_manager.update_reqs_token_counter(
+            #     rpd_reqs,
+            #     next_token_ids,
+            # )
+            decode_reqs.extend(rpd_reqs)


The logic for updating request state after remote prefill seems incomplete. In _post_remote_prefill, the logic to update req_to_next_token_ids is commented out with a TODO. In _decode_filter_reqs, update_reqs_token_counter is also commented out. This will lead to incorrect state tracking for requests, potentially causing bugs in token generation, length checks, and sampling.

gemini-code-assist · 2025-08-07T08:50:56Z

docker/Dockerfile.nixl

+RUN chmod 777 -R /tmp && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+    ca-certificates \
+    libssl-dev \
+    curl \
+    g++ \
+    make \
+    git && \
+    rm -rf /var/lib/apt/lists/*


This RUN command has a couple of issues that go against Docker best practices:

chmod 777 -R /tmp: This is insecure as it gives world-writable permissions to the /tmp directory.

Multiple apt-get update calls: This Dockerfile contains multiple RUN apt-get update commands (here and on lines 42, 44, 69). This is inefficient and can lead to caching problems.

It's recommended to address these for a more secure and efficient Docker image.

RUN chmod 1777 -R /tmp && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ ca-certificates \ libssl-dev \ curl \ g++ \ make \ git && \ rm -rf /var/lib/apt/lists/*

gemini-code-assist · 2025-08-07T08:50:56Z

docker/Dockerfile.nixl

+RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu124
+
+RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
+RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel &&  pip install --no-deps -v .


Cloning from the main branch for LightKernel (here) and nixl (L72) makes the build non-reproducible, as main can change at any time. This can lead to unexpected behavior or build failures in the future.

RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && git checkout <SPECIFIC_COMMIT_OR_TAG> && pip install --no-deps -v .

gemini-code-assist · 2025-08-07T08:50:56Z

docker/Dockerfile.nixl.deepep

+RUN chmod 777 -R /tmp && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+    ca-certificates \
+    libssl-dev \
+    curl \
+    g++ \
+    make \
+    git && \
+    rm -rf /var/lib/apt/lists/*


This RUN command has a couple of issues that go against Docker best practices:

chmod 777 -R /tmp: This is insecure as it gives world-writable permissions to the /tmp directory.

Multiple apt-get update calls: This Dockerfile contains multiple RUN apt-get update commands (here and on lines 44, 82, 107). This is inefficient and can lead to caching problems.

It's recommended to address these for a more secure and efficient Docker image.

RUN chmod 1777 -R /tmp && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ ca-certificates \ libssl-dev \ curl \ g++ \ make \ git && \ rm -rf /var/lib/apt/lists/*

gemini-code-assist · 2025-08-07T08:50:57Z

docker/Dockerfile.nixl.deepep

+RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu124
+
+RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
+RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel &&  pip install --no-deps -v .


Cloning from the main branch for LightKernel (here) and nixl (L110) makes the build non-reproducible, as main can change at any time. This can lead to unexpected behavior or build failures in the future.

RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && git checkout <SPECIFIC_COMMIT_OR_TAG> && pip install --no-deps -v .

gemini-code-assist · 2025-08-07T08:50:57Z

lightllm/common/mem_manager.py

+        if isinstance(self, MemoryManager) and type(self) != MemoryManager:
+            raise NotImplementedError("subclass need reimpl this method")


The check isinstance(self, MemoryManager) and type(self) != MemoryManager to enforce subclass implementation is fragile and not the standard Python way. It will not work for subclasses of subclasses (e.g., class B(A) where class A(MemoryManager)). Also, there's a typo "reimpl".

Using Python's abc module to define MemoryManager as an abstract base class and alloc_paged_kv_move_buffer as an abstract method would be a more robust approach.

Suggested change

if isinstance(self, MemoryManager) and type(self) != MemoryManager:

raise NotImplementedError("subclass need reimpl this method")

if type(self) is not MemoryManager:

raise NotImplementedError("subclass need to reimplement this method")

Weichao Luo and others added 29 commits August 5, 2025 17:21

WIP.

ca2cf32

working.

08f3cc5

prefill abort -> notify decode.

0e40f4a

fix long connect time.

48b3e6d

nixl pd.

bfe5f09

fixup.

a9decbc

fixup.

f0a99b2

rename.

e61849d

fix lint.

5be42ad

support dp for pd_nixl

3cb62d6

chunk transfer.

d7980f8

fix max_new_token.

864fc56

rename args and update request state with list.

6e47801

update & add docker

81d1487

fix.

0883600

fix lint.

53fcf6f

fix abort.

2eda97e

fix prefill num_tokens > decode num_tokens

958d941

fix chunked transfer & bugs.

4842cbf

fix chunked transfer.

73e85cd

paged transfer.

ae54d55

rebase main.

afa4cf1

fix lint.

caa503a

fix.

7a9ca8b

fixup.

ef8daf1

fixup.

f60c390

rebase main.

d710dd3

fix first token bug.

a1a469d

fix dp.

7d5ad30

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

Weichao Luo added 2 commits August 7, 2025 16:51

fix lint.

e3aba0a

fixup.

dd0f189

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pd with nixl backend (rebase main) #1002

pd with nixl backend (rebase main) #1002

Uh oh!

kingder commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Uh oh!

gemini-code-assist bot Aug 7, 2025

Uh oh!

gemini-code-assist bot Aug 7, 2025

Uh oh!

gemini-code-assist bot Aug 7, 2025

Uh oh!

gemini-code-assist bot Aug 7, 2025

Uh oh!

gemini-code-assist bot Aug 7, 2025

Uh oh!

gemini-code-assist bot Aug 7, 2025

Uh oh!

Uh oh!

	time.sleep(PDNIXLBackendBase._THREAD_WAIT_INTERVAL)
	await asyncio.sleep(PDNIXLBackendBase._THREAD_WAIT_INTERVAL)

		if isinstance(self, MemoryManager) and type(self) != MemoryManager:
		raise NotImplementedError("subclass need reimpl this method")

pd with nixl backend (rebase main) #1002

Are you sure you want to change the base?

pd with nixl backend (rebase main) #1002

Uh oh!

Conversation

kingder commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!