Skip to content

Optimize scheduler for chunk prefill#7454

Open
liyonghua0910 wants to merge 1 commit intoPaddlePaddle:release/2.4from
liyonghua0910:release/2.4+20260416_opt_prefill
Open

Optimize scheduler for chunk prefill#7454
liyonghua0910 wants to merge 1 commit intoPaddlePaddle:release/2.4from
liyonghua0910:release/2.4+20260416_opt_prefill

Conversation

@liyonghua0910
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 17, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-17 11:01 CST

📋 Review 摘要

PR 概述:优化 chunk prefill 场景下 RUNNING 请求的调度策略,移除抢占逻辑,改为直接中断并阻止 WAITING 请求调度
变更范围:Scheduler 调度逻辑(resource_manager_v1.py
影响面 TagScheduler KVCache

📝 PR 规范检查

PR 标题缺少规范 Tag,描述中 Motivation 和 Modifications 部分未填写。

标题建议(可直接复制):

  • [Scheduler][Optimization] Optimize scheduler for chunk prefill

描述建议:请补充 Motivation(为什么需要此优化)和 Modifications(具体改了什么)。

问题

级别 文件 概述
🔴 Bug resource_manager_v1.py:698 _get_can_schedule_prefill_threshold_block 调用缺少 request 参数,运行时会 TypeError
🔴 Bug resource_manager_v1.py:702 allocate_gpu_blocks 多传了 request_id 参数,运行时会 TypeError

总体评价

调度策略优化方向合理,但存在两个方法调用参数不匹配的 Bug,均会导致运行时 TypeError 崩溃,需修复后才能合入。

)
num_new_tokens = self._get_num_new_tokens(request, token_budget)
num_new_block = self.get_new_block_nums(request, num_new_tokens)
can_schedule_block_num_threshold = self._get_can_schedule_prefill_threshold_block(num_new_block)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug _get_can_schedule_prefill_threshold_block 方法签名为 (self, request, num_chunk_new_block),需要两个参数,但这里只传了 num_new_block 一个参数,缺少 request。运行时会抛出 TypeError: _get_can_schedule_prefill_threshold_block() missing 1 required positional argument

注意:同文件中 WAITING 请求调度部分(第762行、第810行)的调用是正确的,传了 (request, num_new_block) 两个参数。

建议修复:

can_schedule_block_num_threshold = self._get_can_schedule_prefill_threshold_block(request, num_new_block)

request.block_tables.extend(self.cache_manager.allocate_gpu_blocks(num_new_block))
if self.cache_manager.can_allocate_gpu_blocks(can_schedule_block_num_threshold):
request.block_tables.extend(
self.cache_manager.allocate_gpu_blocks(num_new_block, request.request_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug allocate_gpu_blocks 方法签名为 (self, num_blocks),只接受一个参数,但这里额外传入了 request.request_id。运行时会抛出 TypeError: allocate_gpu_blocks() takes 2 positional arguments but 3 were given

注意:同文件中其他所有 allocate_gpu_blocks 调用(第620行、第633行、第648行、第662行、第768行、第816行)都只传了一个参数。

建议修复:

request.block_tables.extend(
    self.cache_manager.allocate_gpu_blocks(num_new_block)
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants