Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 12 additions & 11 deletions fastdeploy/engine/sched/resource_manager_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,7 @@ def schedule(self):
preempted_reqs: list[Request] = []
error_reqs: list[tuple[str, str]] = []
token_budget = self.config.scheduler_config.max_num_batched_tokens
chunk_prefill_in_running_not_satisfied = False

# First, schedule the RUNNING requests.
req_index = 0
Expand Down Expand Up @@ -694,27 +695,27 @@ def _allocate_decode_and_extend():
)
num_new_tokens = self._get_num_new_tokens(request, token_budget)
num_new_block = self.get_new_block_nums(request, num_new_tokens)
can_schedule_block_num_threshold = self._get_can_schedule_prefill_threshold_block(num_new_block)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug _get_can_schedule_prefill_threshold_block 方法签名为 (self, request, num_chunk_new_block),需要两个参数,但这里只传了 num_new_block 一个参数,缺少 request。运行时会抛出 TypeError: _get_can_schedule_prefill_threshold_block() missing 1 required positional argument

注意:同文件中 WAITING 请求调度部分(第762行、第810行)的调用是正确的,传了 (request, num_new_block) 两个参数。

建议修复:

can_schedule_block_num_threshold = self._get_can_schedule_prefill_threshold_block(request, num_new_block)

# Allocate blocks to prefill
if self.cache_manager.can_allocate_gpu_blocks(num_new_block):
request.block_tables.extend(self.cache_manager.allocate_gpu_blocks(num_new_block))
# Prepare prefill task
scheduled_reqs.append(self._prepare_prefill_task(request, num_new_tokens))
else: # Not enough blocks to allocate, trigger preemption
can_schedule = self._trigger_preempt(request, num_new_block, preempted_reqs, scheduled_reqs)
if not can_schedule:
break
request.block_tables.extend(self.cache_manager.allocate_gpu_blocks(num_new_block))
if self.cache_manager.can_allocate_gpu_blocks(can_schedule_block_num_threshold):
request.block_tables.extend(
self.cache_manager.allocate_gpu_blocks(num_new_block, request.request_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug allocate_gpu_blocks 方法签名为 (self, num_blocks),只接受一个参数,但这里额外传入了 request.request_id。运行时会抛出 TypeError: allocate_gpu_blocks() takes 2 positional arguments but 3 were given

注意:同文件中其他所有 allocate_gpu_blocks 调用(第620行、第633行、第648行、第662行、第768行、第816行)都只传了一个参数。

建议修复:

request.block_tables.extend(
    self.cache_manager.allocate_gpu_blocks(num_new_block)
)

)
# Prepare prefill task
scheduled_reqs.append(self._prepare_prefill_task(request, num_new_tokens))
else: # Not enough blocks to allocate
chunk_prefill_in_running_not_satisfied = True
break # For chunk prefill request, if not satisfy condition for prefill, just break
token_budget -= num_new_tokens
request.num_computed_tokens += num_new_tokens
if self.config.cache_config.enable_prefix_caching:
self.cache_manager.update_cache_blocks(
request, self.config.cache_config.block_size, request.num_computed_tokens
)
req_index += 1
# schedule the WAITING requests.
if not preempted_reqs:

# Second, schedule the WAITING requests.
if (not preempted_reqs) and (not chunk_prefill_in_running_not_satisfied):
skip_requests: list[Request] = []
while self.waiting and token_budget > 0:
if len(self.running) == self.max_num_seqs:
Expand Down
Loading