Skip to content

Commit bd34973

Browse files
committed
address comments
Signed-off-by: Perkz Zheng <[email protected]>
1 parent a2ed602 commit bd34973

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

tensorrt_llm/_torch/pyexecutor/resource_manager.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -713,6 +713,9 @@ def get_cache_size_per_token(model_config: ModelConfigPython,
713713
mem_per_token / 16)
714714
else:
715715
# All other cases (fp16/bf16 kv cache), we need 2 bytes per token for K and V.
716+
assert quant_config is None or (
717+
not quant_config.quant_mode.has_kv_cache_quant()
718+
), "Quantized kv cache is not expected"
716719
mem_per_token *= 2
717720
return mem_per_token
718721

0 commit comments

Comments
 (0)