Skip to content

[Scheduler] [Optimization] Only preempt decode requests and better manage reserved blocks in scheduler#7444

Open
liyonghua0910 wants to merge 3 commits intoPaddlePaddle:developfrom
liyonghua0910:develop+20260416_opt_schedule
Open

[Scheduler] [Optimization] Only preempt decode requests and better manage reserved blocks in scheduler#7444
liyonghua0910 wants to merge 3 commits intoPaddlePaddle:developfrom
liyonghua0910:develop+20260416_opt_schedule

Conversation

@liyonghua0910
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 16, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-16 21:19 CST

📋 Review 摘要

PR 概述:重构调度器的抢占策略(仅抢占 decode 请求)与 reserved blocks 管理机制(对齐 SGLang 的 new_token_ratio 方案)
变更范围:Scheduler、Engine(RequestStatus 枚举拆分)、TokenProcessor(状态转换)、环境变量
影响面 TagScheduler Engine

📝 PR 规范检查

PR 描述中 Motivation 和 Modifications 部分未填写,建议补充变更动机和具体修改内容,便于后续回溯。

描述模板(可直接复制):

## Motivation
优化调度器抢占策略:(1) 将 RequestStatus.RUNNING 拆分为 RUNNING_PREFILL 和 RUNNING_DECODE,仅允许抢占 decode 阶段请求;(2) 将固定 reserve block 数量替换为基于 new_token_ratio 的动态预留策略(对齐 SGLang retract_decode 逻辑);(3) chunk prefill 在 running 队列中不再触发抢占,而是推迟调度。

## Modifications
- `request.py`: RequestStatus 枚举增加 RUNNING_PREFILL/RUNNING_DECODE 细分状态
- `resource_manager_v1.py`: 重构 `_trigger_preempt` 仅抢占 decode 请求;新增 `_recompute_new_token_ratio_on_preemption` 和 `_get_running_request_reserve_blocks`;chunk prefill 分配不足时不再触发抢占
- `token_processor.py`: 在第一次输出 token 时将 RUNNING_PREFILL 转换为 RUNNING_DECODE
- `envs.py`: 新增 FD_INIT_NEW_TOKEN_RATIO 等 5 个环境变量
- `requirements.txt`: 新增 mooncake-transfer-engine 依赖(与本 PR 调度优化无关)

问题

级别 文件 概述
🟡 建议 resource_manager_v1.py:301 _can_preempt 方法已无调用方,为死代码
🟡 建议 resource_manager_v1.py:991 chunk prefill 失败阻塞全部 waiting 请求的行为变更需确认
🟡 建议 tests/v1/test_resource_manager_v1.py 删除了 test_trigger_preempt_records_tasks 但未为新抢占逻辑补充测试
❓ 疑问 requirements.txt:48 mooncake-transfer-engine 依赖与本 PR 调度优化无关

总体评价

整体方案合理,对齐 SGLang 的 new_token_ratio + retract_decode 策略是成熟的工程实践。RequestStatus 拆分和仅抢占 decode 请求的设计清晰。建议清理死代码、补充新抢占逻辑的单元测试、并确认 mooncake-transfer-engine 依赖是否应拆到单独 PR。

return True
return False

def _can_preempt_with_decode_task(self):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 _can_preempt 方法已成为死代码

_trigger_preempt 已从 self._can_preempt() 切换到 self._can_preempt_with_decode_task(),经全仓搜索确认 _can_preempt 不再有任何调用方。建议删除该方法,避免后续维护者误用不带 decode 状态过滤的旧版本。

# Prepare prefill task
scheduled_reqs.append(self._prepare_prefill_task(request, num_new_tokens))
else: # Not enough blocks to allocate
chunk_prefill_in_running_not_satisfied = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 chunk prefill 分配失败后阻塞全部 waiting 请求——行为变更较大,请确认是否符合预期

之前此处的逻辑是:chunk prefill 分配不足时触发 _trigger_preempt 尝试腾出空间。现在改为直接 break 并设置 chunk_prefill_in_running_not_satisfied = True,这会导致本轮调度所有 waiting 请求都被跳过(包括那些可能只需少量 block 的小请求)。

这在高负载场景下可能导致 waiting 队列饥饿。请确认:

  1. 是否考虑过仅跳过「需要大量 block 的 waiting 请求」而允许小请求通过?
  2. 在 running 队列中有多个 chunk prefill 请求时,排在前面的请求失败是否应允许后面的请求继续尝试(当前是直接 break 整个循环)?

self.assertIn(preempted_req.request_id, manager.to_be_rescheduled_request_id_set)
self.assertEqual(preempted_reqs[0], preempted_req)
self.assertEqual(scheduled_reqs[0].request_id, preempted_req.request_id)
self.assertEqual(waiting_request.status, RequestStatus.RUNNING_PREFILL)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 新的抢占逻辑缺少单元测试覆盖

test_trigger_preempt_records_tasks 被删除,但新逻辑(仅抢占 RUNNING_DECODE 请求、从后向前扫描、_recompute_new_token_ratio_on_preemption)没有对应的新测试。建议补充以下场景的测试:

  1. running 队列中混合 RUNNING_PREFILL 和 RUNNING_DECODE 请求时,仅 RUNNING_DECODE 被抢占
  2. 所有 running 请求都是 RUNNING_PREFILL 时,_trigger_preempt 返回 False
  3. _recompute_new_token_ratio_on_preemption 在不同 decode 进度下的 ratio 计算正确性

Comment thread requirements.txt
setproctitle
aistudio_sdk
p2pstore
mooncake-transfer-engine>=0.3.10.post1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 mooncake-transfer-engine 依赖与本 PR 的调度优化主题无关

本 PR 标题为 Scheduler Optimization,但此处新增了 mooncake-transfer-engine 依赖。混合不相关变更会增加 review 难度和回滚风险。建议将此依赖变更拆到独立 PR 中提交。

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.94737% with 16 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@2d8338f). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/sched/resource_manager_v1.py 79.71% 2 Missing and 12 partials ⚠️
fastdeploy/output/token_processor.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7444   +/-   ##
==========================================
  Coverage           ?   73.78%           
==========================================
  Files              ?      398           
  Lines              ?    54980           
  Branches           ?     8614           
==========================================
  Hits               ?    40565           
  Misses             ?    11685           
  Partials           ?     2730           
Flag Coverage Δ
GPU 73.78% <78.94%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants