Skip to content

Commit 20f2be7

Browse files
committed
AMD specific optimizations
1 parent 92e1236 commit 20f2be7

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

lib/inference_max_lib.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,9 @@ def build_server_inference_job_cmd( self, ):
205205
export TP={self.bp_dict['tensor_parallelism']}
206206
export CONC={self.bp_dict['max_concurrency']}
207207
export HF_TOKEN={self.hf_token}
208+
export VLLM_USE_AITER_UNIFIED_ATTENTION=1
209+
export VLLM_ROCM_USE_AITER_MHA=0
210+
export VLLM_ROCM_USE_AITER_FUSED_MOE_A16W4=1
208211
export PORT={self.bp_dict['port_no']}' > /tmp/server_env_script.sh"
209212
'''
210213
time.sleep(3)

0 commit comments

Comments
 (0)