diff --git a/WorkflowExecAgent/docker_compose/intel/cpu/xeon/README.md b/WorkflowExecAgent/docker_compose/intel/cpu/xeon/README.md index 8ce9d1e097..6a132a9ef0 100644 --- a/WorkflowExecAgent/docker_compose/intel/cpu/xeon/README.md +++ b/WorkflowExecAgent/docker_compose/intel/cpu/xeon/README.md @@ -60,6 +60,24 @@ export temperature=0 export max_new_tokens=1000 ``` +
+ Using Remote LLM Endpoints +When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to IntelĀ® AI for Enterprise Inference offerings. + +Set the following environment variables. + +- `llm_endpoint_url` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.inference.denvrdata.com). **Note:** If not using LiteLLM, the second part of the model card needs to be appended to the URL i.e. `/Llama-3.3-70B-Instruct` from `meta-llama/Llama-3.3-70B-Instruct`. +- `llm_endpoint_api_key` is the access token or key to access the model(s) on the server. +- `LLM_MODEL_ID` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`. + +```bash +export llm_endpoint_url= +export llm_endpoint_api_key= +export LLM_MODEL_ID= +``` + +
+ ### Deploy the Services Using Docker Compose For an out-of-the-box experience, this guide uses an example workflow serving API service. There are 3 services needed for the setup: the agent microservice, an LLM inference service, and the workflow serving API. diff --git a/WorkflowExecAgent/docker_compose/intel/cpu/xeon/compose_vllm.yaml b/WorkflowExecAgent/docker_compose/intel/cpu/xeon/compose_vllm.yaml index 44a2b365d0..9804f003a7 100644 --- a/WorkflowExecAgent/docker_compose/intel/cpu/xeon/compose_vllm.yaml +++ b/WorkflowExecAgent/docker_compose/intel/cpu/xeon/compose_vllm.yaml @@ -17,6 +17,7 @@ services: recursion_limit: ${recursion_limit} llm_engine: ${llm_engine} llm_endpoint_url: ${llm_endpoint_url} + api_key: ${llm_endpoint_api_key} model: ${model} temperature: ${temperature} max_new_tokens: ${max_new_tokens} diff --git a/WorkflowExecAgent/tests/2_start_vllm_service.sh b/WorkflowExecAgent/tests/2_start_vllm_service.sh index 4522c1b94f..c39990ac51 100644 --- a/WorkflowExecAgent/tests/2_start_vllm_service.sh +++ b/WorkflowExecAgent/tests/2_start_vllm_service.sh @@ -38,7 +38,7 @@ function build_vllm_docker_image() { function start_vllm_service() { echo "start vllm service" export VLLM_SKIP_WARMUP=true - docker run -d -p ${vllm_port}:${vllm_port} --rm --network=host --name test-comps-vllm-service -v ~/.cache/huggingface:/root/.cache/huggingface -v ${WORKPATH}/tests/tool_chat_template_mistral_custom.jinja:/root/tool_chat_template_mistral_custom.jinja -e HF_TOKEN=$HF_TOKEN -e http_proxy=$http_proxy -e https_proxy=$https_proxy -it vllm-cpu-env --model ${model} --port ${vllm_port} --chat-template /root/tool_chat_template_mistral_custom.jinja --enable-auto-tool-choice --tool-call-parser mistral + docker run -d -p ${vllm_port}:${vllm_port} --rm --network=host --name test-comps-vllm-service -v ~/.cache/huggingface:/root/.cache/huggingface -v ${WORKPATH}/tests/tool_chat_template_mistral_custom.jinja:/root/tool_chat_template_mistral_custom.jinja -e HF_TOKEN=$HF_TOKEN -e http_proxy=$http_proxy -e https_proxy=$https_proxy -it public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.10.0 --model ${model} --port ${vllm_port} --chat-template /root/tool_chat_template_mistral_custom.jinja --enable-auto-tool-choice --tool-call-parser mistral echo ${LOG_PATH}/vllm-service.log sleep 10s echo "Waiting vllm ready" @@ -64,9 +64,9 @@ function start_vllm_service() { } function main() { - echo "==================== Build vllm docker image ====================" - build_vllm_docker_image - echo "==================== Build vllm docker image completed ====================" + # echo "==================== Build vllm docker image ====================" + # build_vllm_docker_image + # echo "==================== Build vllm docker image completed ====================" echo "==================== Start vllm docker service ====================" start_vllm_service diff --git a/WorkflowExecAgent/tests/3_launch_and_validate_agent.sh b/WorkflowExecAgent/tests/3_launch_and_validate_agent.sh index 3fa75920c3..f537a7f503 100644 --- a/WorkflowExecAgent/tests/3_launch_and_validate_agent.sh +++ b/WorkflowExecAgent/tests/3_launch_and_validate_agent.sh @@ -16,6 +16,7 @@ export HF_TOKEN=${HF_TOKEN} export llm_engine=vllm export ip_address=$(hostname -I | awk '{print $1}') export llm_endpoint_url=http://${ip_address}:${vllm_port} +export api_key="" export model=mistralai/Mistral-7B-Instruct-v0.3 export recursion_limit=25 export temperature=0