Skip to content

Commit 016b6bd

Browse files
authored
[DeepSeek R1] Fix the performance regression related to use graph (#2146)
Fix the issue caused by context_blocks are setting to non-zero value for decode which impacts the decode performance.
1 parent 46ad52b commit 016b6bd

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

vllm/worker/hpu_model_runner.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2908,12 +2908,12 @@ def execute_model(
29082908
assert is_prompt is not None
29092909
batch_size = input_tokens.size(0)
29102910
seq_len = self._seq_len(attn_metadata)
2911-
num_blocks = self._num_blocks(attn_metadata)
2911+
context_blocks = self._num_blocks(attn_metadata) if is_prompt else 0
29122912
use_graphs = self._use_graphs(batch_size,
29132913
seq_len,
29142914
is_prompt,
29152915
is_profile_run=profile_run_mode,
2916-
context_blocks=num_blocks)
2916+
context_blocks=context_blocks)
29172917
self._check_config(batch_size, seq_len, attn_metadata, warmup_mode)
29182918

29192919
lora_mask: torch.Tensor = None

0 commit comments

Comments
 (0)