[Scheduler] [Optimization] Only preempt decode requests and better manage reserved blocks in scheduler by liyonghua0910 · Pull Request #7444 · PaddlePaddle/FastDeploy

liyonghua0910 · 2026-04-16T13:02:48Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…duler

paddle-bot · 2026-04-16T13:02:55Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-16 21:19 CST

📋 Review 摘要

PR 概述：重构调度器的抢占策略（仅抢占 decode 请求）与 reserved blocks 管理机制（对齐 SGLang 的 new_token_ratio 方案）
变更范围：Scheduler、Engine（RequestStatus 枚举拆分）、TokenProcessor（状态转换）、环境变量
影响面 Tag：Scheduler Engine

📝 PR 规范检查

PR 描述中 Motivation 和 Modifications 部分未填写，建议补充变更动机和具体修改内容，便于后续回溯。

描述模板（可直接复制）：

## Motivation
优化调度器抢占策略：(1) 将 RequestStatus.RUNNING 拆分为 RUNNING_PREFILL 和 RUNNING_DECODE，仅允许抢占 decode 阶段请求；(2) 将固定 reserve block 数量替换为基于 new_token_ratio 的动态预留策略（对齐 SGLang retract_decode 逻辑）；(3) chunk prefill 在 running 队列中不再触发抢占，而是推迟调度。

## Modifications
- `request.py`: RequestStatus 枚举增加 RUNNING_PREFILL/RUNNING_DECODE 细分状态
- `resource_manager_v1.py`: 重构 `_trigger_preempt` 仅抢占 decode 请求；新增 `_recompute_new_token_ratio_on_preemption` 和 `_get_running_request_reserve_blocks`；chunk prefill 分配不足时不再触发抢占
- `token_processor.py`: 在第一次输出 token 时将 RUNNING_PREFILL 转换为 RUNNING_DECODE
- `envs.py`: 新增 FD_INIT_NEW_TOKEN_RATIO 等 5 个环境变量
- `requirements.txt`: 新增 mooncake-transfer-engine 依赖（与本 PR 调度优化无关）

问题

级别	文件	概述
🟡 建议	`resource_manager_v1.py:301`	`_can_preempt` 方法已无调用方，为死代码
🟡 建议	`resource_manager_v1.py:991`	chunk prefill 失败阻塞全部 waiting 请求的行为变更需确认
🟡 建议	`tests/v1/test_resource_manager_v1.py`	删除了 `test_trigger_preempt_records_tasks` 但未为新抢占逻辑补充测试
❓ 疑问	`requirements.txt:48`	`mooncake-transfer-engine` 依赖与本 PR 调度优化无关

总体评价

整体方案合理，对齐 SGLang 的 new_token_ratio + retract_decode 策略是成熟的工程实践。RequestStatus 拆分和仅抢占 decode 请求的设计清晰。建议清理死代码、补充新抢占逻辑的单元测试、并确认 mooncake-transfer-engine 依赖是否应拆到单独 PR。

PaddlePaddle-bot · 2026-04-16T13:19:24Z

                return True
        return False

+    def _can_preempt_with_decode_task(self):


🟡 建议 _can_preempt 方法已成为死代码

_trigger_preempt 已从 self._can_preempt() 切换到 self._can_preempt_with_decode_task()，经全仓搜索确认 _can_preempt 不再有任何调用方。建议删除该方法，避免后续维护者误用不带 decode 状态过滤的旧版本。

PaddlePaddle-bot · 2026-04-16T13:19:24Z

                        # Prepare prefill task
                        scheduled_reqs.append(self._prepare_prefill_task(request, num_new_tokens))
+                    else:  # Not enough blocks to allocate
+                        chunk_prefill_in_running_not_satisfied = True


❓ 疑问 chunk prefill 分配失败后阻塞全部 waiting 请求——行为变更较大，请确认是否符合预期

之前此处的逻辑是：chunk prefill 分配不足时触发 _trigger_preempt 尝试腾出空间。现在改为直接 break 并设置 chunk_prefill_in_running_not_satisfied = True，这会导致本轮调度所有 waiting 请求都被跳过（包括那些可能只需少量 block 的小请求）。

这在高负载场景下可能导致 waiting 队列饥饿。请确认：

是否考虑过仅跳过「需要大量 block 的 waiting 请求」而允许小请求通过？

在 running 队列中有多个 chunk prefill 请求时，排在前面的请求失败是否应允许后面的请求继续尝试（当前是直接 break 整个循环）？

PaddlePaddle-bot · 2026-04-16T13:19:24Z

-        self.assertIn(preempted_req.request_id, manager.to_be_rescheduled_request_id_set)
-        self.assertEqual(preempted_reqs[0], preempted_req)
-        self.assertEqual(scheduled_reqs[0].request_id, preempted_req.request_id)
+        self.assertEqual(waiting_request.status, RequestStatus.RUNNING_PREFILL)


🟡 建议 新的抢占逻辑缺少单元测试覆盖

test_trigger_preempt_records_tasks 被删除，但新逻辑（仅抢占 RUNNING_DECODE 请求、从后向前扫描、_recompute_new_token_ratio_on_preemption）没有对应的新测试。建议补充以下场景的测试：

running 队列中混合 RUNNING_PREFILL 和 RUNNING_DECODE 请求时，仅 RUNNING_DECODE 被抢占

所有 running 请求都是 RUNNING_PREFILL 时，_trigger_preempt 返回 False

_recompute_new_token_ratio_on_preemption 在不同 decode 进度下的 ratio 计算正确性

PaddlePaddle-bot · 2026-04-16T13:19:24Z

 setproctitle
 aistudio_sdk
 p2pstore
+mooncake-transfer-engine>=0.3.10.post1


❓ 疑问 mooncake-transfer-engine 依赖与本 PR 的调度优化主题无关

本 PR 标题为 Scheduler Optimization，但此处新增了 mooncake-transfer-engine 依赖。混合不相关变更会增加 review 难度和回滚风险。建议将此依赖变更拆到独立 PR 中提交。

codecov-commenter · 2026-04-16T14:47:48Z

Codecov Report

❌ Patch coverage is 78.94737% with 16 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@2d8338f). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/engine/sched/resource_manager_v1.py	79.71%	2 Missing and 12 partials ⚠️
fastdeploy/output/token_processor.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7444   +/-   ##
==========================================
  Coverage           ?   73.78%           
==========================================
  Files              ?      398           
  Lines              ?    54980           
  Branches           ?     8614           
==========================================
  Hits               ?    40565           
  Misses             ?    11685           
  Partials           ?     2730

Flag	Coverage Δ
GPU	`73.78% <78.94%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

liyonghua0910 and others added 3 commits April 16, 2026 13:01

[Optimization] Use new_token_ratio to control reserved blocks in sche…

75f786c

…duler

Only decode req can be preempted

ed1c51f

Optimize scheduler for chunk prefill

28bea0c

liyonghua0910 temporarily deployed to Metax_ci April 16, 2026 13:02 — with GitHub Actions Inactive

PaddlePaddle-bot reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scheduler] [Optimization] Only preempt decode requests and better manage reserved blocks in scheduler#7444

[Scheduler] [Optimization] Only preempt decode requests and better manage reserved blocks in scheduler#7444
liyonghua0910 wants to merge 3 commits intoPaddlePaddle:developfrom
liyonghua0910:develop+20260416_opt_schedule

liyonghua0910 commented Apr 16, 2026

Uh oh!

paddle-bot bot commented Apr 16, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Uh oh!

codecov-commenter commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

liyonghua0910 commented Apr 16, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 16, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 16, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants