Skip to content

Commit c0bd6a6

Browse files
Fix Auto_Round Quatization Loading on SM75 and Lower GPUs (vllm-project#24217)
Signed-off-by: RoadToNowhereX <[email protected]> Co-authored-by: Wentao Ye <[email protected]>
1 parent 3144d90 commit c0bd6a6

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

vllm/model_executor/layers/quantization/auto_round.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -327,6 +327,8 @@ def apply_gptq_quant_layer(self,
327327

328328
if isinstance(layer, FusedMoE):
329329
if use_marlin:
330+
return GPTQMarlinMoEMethod(quant_args_marlin, layer.moe)
331+
else:
330332
from vllm.model_executor.layers.quantization.moe_wna16 import (
331333
MoeWNA16Config)
332334

@@ -339,7 +341,6 @@ def apply_gptq_quant_layer(self,
339341
}
340342
return MoeWNA16Config.from_config(config).get_quant_method(
341343
layer, prefix)
342-
return GPTQMarlinMoEMethod(quant_args_marlin, layer.moe)
343344

344345
if isinstance(layer, (LinearBase, ParallelLMHead)):
345346
if use_marlin:

0 commit comments

Comments
 (0)