[Scheduler] [Optimization] Only preempt decode requests and better manage reserved blocks in scheduler#7444
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-16 21:19 CST
📋 Review 摘要
PR 概述:重构调度器的抢占策略(仅抢占 decode 请求)与 reserved blocks 管理机制(对齐 SGLang 的 new_token_ratio 方案)
变更范围:Scheduler、Engine(RequestStatus 枚举拆分)、TokenProcessor(状态转换)、环境变量
影响面 Tag:Scheduler Engine
📝 PR 规范检查
PR 描述中 Motivation 和 Modifications 部分未填写,建议补充变更动机和具体修改内容,便于后续回溯。
描述模板(可直接复制):
## Motivation
优化调度器抢占策略:(1) 将 RequestStatus.RUNNING 拆分为 RUNNING_PREFILL 和 RUNNING_DECODE,仅允许抢占 decode 阶段请求;(2) 将固定 reserve block 数量替换为基于 new_token_ratio 的动态预留策略(对齐 SGLang retract_decode 逻辑);(3) chunk prefill 在 running 队列中不再触发抢占,而是推迟调度。
## Modifications
- `request.py`: RequestStatus 枚举增加 RUNNING_PREFILL/RUNNING_DECODE 细分状态
- `resource_manager_v1.py`: 重构 `_trigger_preempt` 仅抢占 decode 请求;新增 `_recompute_new_token_ratio_on_preemption` 和 `_get_running_request_reserve_blocks`;chunk prefill 分配不足时不再触发抢占
- `token_processor.py`: 在第一次输出 token 时将 RUNNING_PREFILL 转换为 RUNNING_DECODE
- `envs.py`: 新增 FD_INIT_NEW_TOKEN_RATIO 等 5 个环境变量
- `requirements.txt`: 新增 mooncake-transfer-engine 依赖(与本 PR 调度优化无关)
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | resource_manager_v1.py:301 |
_can_preempt 方法已无调用方,为死代码 |
| 🟡 建议 | resource_manager_v1.py:991 |
chunk prefill 失败阻塞全部 waiting 请求的行为变更需确认 |
| 🟡 建议 | tests/v1/test_resource_manager_v1.py |
删除了 test_trigger_preempt_records_tasks 但未为新抢占逻辑补充测试 |
| ❓ 疑问 | requirements.txt:48 |
mooncake-transfer-engine 依赖与本 PR 调度优化无关 |
总体评价
整体方案合理,对齐 SGLang 的 new_token_ratio + retract_decode 策略是成熟的工程实践。RequestStatus 拆分和仅抢占 decode 请求的设计清晰。建议清理死代码、补充新抢占逻辑的单元测试、并确认 mooncake-transfer-engine 依赖是否应拆到单独 PR。
| return True | ||
| return False | ||
|
|
||
| def _can_preempt_with_decode_task(self): |
There was a problem hiding this comment.
🟡 建议 _can_preempt 方法已成为死代码
_trigger_preempt 已从 self._can_preempt() 切换到 self._can_preempt_with_decode_task(),经全仓搜索确认 _can_preempt 不再有任何调用方。建议删除该方法,避免后续维护者误用不带 decode 状态过滤的旧版本。
| # Prepare prefill task | ||
| scheduled_reqs.append(self._prepare_prefill_task(request, num_new_tokens)) | ||
| else: # Not enough blocks to allocate | ||
| chunk_prefill_in_running_not_satisfied = True |
There was a problem hiding this comment.
❓ 疑问 chunk prefill 分配失败后阻塞全部 waiting 请求——行为变更较大,请确认是否符合预期
之前此处的逻辑是:chunk prefill 分配不足时触发 _trigger_preempt 尝试腾出空间。现在改为直接 break 并设置 chunk_prefill_in_running_not_satisfied = True,这会导致本轮调度所有 waiting 请求都被跳过(包括那些可能只需少量 block 的小请求)。
这在高负载场景下可能导致 waiting 队列饥饿。请确认:
- 是否考虑过仅跳过「需要大量 block 的 waiting 请求」而允许小请求通过?
- 在 running 队列中有多个 chunk prefill 请求时,排在前面的请求失败是否应允许后面的请求继续尝试(当前是直接 break 整个循环)?
| self.assertIn(preempted_req.request_id, manager.to_be_rescheduled_request_id_set) | ||
| self.assertEqual(preempted_reqs[0], preempted_req) | ||
| self.assertEqual(scheduled_reqs[0].request_id, preempted_req.request_id) | ||
| self.assertEqual(waiting_request.status, RequestStatus.RUNNING_PREFILL) |
There was a problem hiding this comment.
🟡 建议 新的抢占逻辑缺少单元测试覆盖
test_trigger_preempt_records_tasks 被删除,但新逻辑(仅抢占 RUNNING_DECODE 请求、从后向前扫描、_recompute_new_token_ratio_on_preemption)没有对应的新测试。建议补充以下场景的测试:
- running 队列中混合 RUNNING_PREFILL 和 RUNNING_DECODE 请求时,仅 RUNNING_DECODE 被抢占
- 所有 running 请求都是 RUNNING_PREFILL 时,
_trigger_preempt返回 False _recompute_new_token_ratio_on_preemption在不同 decode 进度下的 ratio 计算正确性
| setproctitle | ||
| aistudio_sdk | ||
| p2pstore | ||
| mooncake-transfer-engine>=0.3.10.post1 |
There was a problem hiding this comment.
❓ 疑问 mooncake-transfer-engine 依赖与本 PR 的调度优化主题无关
本 PR 标题为 Scheduler Optimization,但此处新增了 mooncake-transfer-engine 依赖。混合不相关变更会增加 review 难度和回滚风险。建议将此依赖变更拆到独立 PR 中提交。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7444 +/- ##
==========================================
Coverage ? 73.78%
==========================================
Files ? 398
Lines ? 54980
Branches ? 8614
==========================================
Hits ? 40565
Misses ? 11685
Partials ? 2730
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.