Skip to content

Codetrans: enable remote endpoints #2100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
3cc227c
add support for remote endpoints
alexsin368 May 24, 2025
26557e5
Merge branch 'opea-project:main' into codetrans-remote-endpoint
alexsin368 May 30, 2025
4b224f5
add api_key to megaservice
alexsin368 May 30, 2025
ec3ca49
add align_inputs to ensure model gets LLM_MODEL_ID
alexsin368 May 30, 2025
b6fcb93
fix typos and add note to run with compose_remote.yaml
alexsin368 May 30, 2025
0be911c
add align_outputs and align_generator
alexsin368 Jun 2, 2025
aec7391
remove eos_token in finish reason
alexsin368 Jun 2, 2025
49a59f5
remove unnecessary service types
alexsin368 Jun 2, 2025
e295c17
Merge branch 'opea-project:main' into codetrans-remote-endpoint
alexsin368 Jun 11, 2025
5a7f000
Merge branch 'opea-project:main' into codetrans-remote-endpoint
alexsin368 Jun 13, 2025
85bf0aa
add support for remote endpoints
alexsin368 May 24, 2025
c69fe78
add api_key to megaservice
alexsin368 May 30, 2025
165a233
add align_inputs to ensure model gets LLM_MODEL_ID
alexsin368 May 30, 2025
a627a3f
fix typos and add note to run with compose_remote.yaml
alexsin368 May 30, 2025
a19bd62
add align_outputs and align_generator
alexsin368 Jun 2, 2025
6d43f3e
remove eos_token in finish reason
alexsin368 Jun 2, 2025
7fdff42
remove unnecessary service types
alexsin368 Jun 2, 2025
fd7267a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 27, 2025
59386be
merge
alexsin368 Jun 27, 2025
0b30743
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 27, 2025
990df02
import json library
alexsin368 Jun 27, 2025
70f60b2
Merge branch 'codetrans-remote-endpoint' of https://github.com/alexsi…
alexsin368 Jun 27, 2025
0877b78
resolve merge conflicts
alexsin368 Jun 27, 2025
08300ad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 27, 2025
da62dd1
update remote endpoint instructions
alexsin368 Jun 27, 2025
35d193a
update remote endpoint instructions
alexsin368 Jun 27, 2025
673c7e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 27, 2025
ad61169
change to API_KEY for naming
alexsin368 Jun 27, 2025
743c743
fix env variable
alexsin368 Jun 27, 2025
1204dc7
add back llm microservice, fix naming, update instructions
alexsin368 Jun 27, 2025
2862f8f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 27, 2025
3a7690a
update instructions
alexsin368 Jun 27, 2025
a4b7093
Merge branch 'codetrans-remote-endpoint' of https://github.com/alexsi…
alexsin368 Jun 27, 2025
772565b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 27, 2025
3f199e6
remove vllm dependency
alexsin368 Jun 27, 2025
e3bbaed
use LLM_ENDPOINT instead of REMOTE_ENDPOINT
alexsin368 Jun 27, 2025
6c8b4e7
fix API_KEY env var
alexsin368 Jun 27, 2025
3a6bd22
set env variables for llm and backend services
alexsin368 Jun 30, 2025
04275de
Merge branch 'opea-project:main' into codetrans-remote-endpoint
alexsin368 Jun 30, 2025
e57221b
Merge branch 'main' into codetrans-remote-endpoint
alexsin368 Jul 2, 2025
e18b566
update landing page, add CI test with vllm litellm server for remote …
alexsin368 Jul 2, 2025
823643d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 2, 2025
67e1e5e
add test instruction for remote endpoints
alexsin368 Jul 2, 2025
b64642c
Merge branch 'codetrans-remote-endpoint' of https://github.com/alexsi…
alexsin368 Jul 2, 2025
a08204b
update with new vllm-ci-server files
alexsin368 Jul 2, 2025
7baeb8d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 2, 2025
4064dde
update test model and steps
alexsin368 Jul 3, 2025
8ceb0ab
Merge branch 'codetrans-remote-endpoint' of https://github.com/alexsi…
alexsin368 Jul 3, 2025
a9d29b6
Merge branch 'main' into codetrans-remote-endpoint
alexsin368 Jul 3, 2025
5f4bdcc
fix path from set_env.sh to top .set_env.sh
alexsin368 Jul 3, 2025
ada7ff4
remove CI test for remote endpoints
alexsin368 Jul 4, 2025
2e75ed3
add note about vLLM
alexsin368 Jul 4, 2025
1890138
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 4, 2025
127b4e1
Merge branch 'main' into codetrans-remote-endpoint
alexsin368 Jul 17, 2025
29314ac
remove mention of denvr
alexsin368 Jul 17, 2025
401ed82
Merge branch 'main' into codetrans-remote-endpoint
alexsin368 Aug 5, 2025
603ad6f
Merge branch 'main' into codetrans-remote-endpoint
chensuyue Aug 11, 2025
2d4cfc5
Merge branch 'main' into codetrans-remote-endpoint
alexsin368 Aug 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions CodeTrans/code_translation.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import json
import os

from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
Expand All @@ -17,12 +18,85 @@
MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 7777))
LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
LLM_MODEL_ID = os.getenv("LLM_MODEL_ID", "mistralai/Mistral-7B-Instruct-v0.3")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", None)


def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
"""Aligns the inputs based on the service type of the current node.

Parameters:
- self: Reference to the current instance of the class.
- inputs: Dictionary containing the inputs for the current node.
- cur_node: The current node in the service orchestrator.
- runtime_graph: The runtime graph of the service orchestrator.
- llm_parameters_dict: Dictionary containing the LLM parameters.
- kwargs: Additional keyword arguments.

Returns:
- inputs: The aligned inputs for the current node.
"""

# Check if the current service type is LLM
if self.services[cur_node].service_type == ServiceType.LLM:
# convert TGI/vLLM to unified OpenAI /v1/chat/completions format
next_inputs = {}
next_inputs["model"] = LLM_MODEL_ID
next_inputs["messages"] = [{"role": "user", "content": inputs["query"]}]
next_inputs["max_tokens"] = llm_parameters_dict["max_tokens"]
next_inputs["top_p"] = llm_parameters_dict["top_p"]
next_inputs["stream"] = inputs["stream"]
next_inputs["frequency_penalty"] = inputs["frequency_penalty"]
next_inputs["temperature"] = inputs["temperature"]
inputs = next_inputs

return inputs


def align_outputs(self, data, cur_node, inputs, runtime_graph, llm_parameters_dict, **kwargs):
next_data = {}
if self.services[cur_node].service_type == ServiceType.LLM and not llm_parameters_dict["stream"]:
if "faqgen" in self.services[cur_node].endpoint:
next_data = data
else:
next_data["text"] = data["choices"][0]["message"]["content"]
else:
next_data = data

return next_data


def align_generator(self, gen, **kwargs):
# OpenAI response format
# b'data:{"id":"","object":"text_completion","created":1725530204,"model":"meta-llama/Meta-Llama-3-8B-Instruct","system_fingerprint":"2.0.1-native","choices":[{"index":0,"delta":{"role":"assistant","content":"?"},"logprobs":null,"finish_reason":null}]}\n\n'
for line in gen:
line = line.decode("utf-8")
start = line.find("{")
end = line.rfind("}") + 1

json_str = line[start:end]
try:
# sometimes yield empty chunk, do a fallback here
json_data = json.loads(json_str)
if "ops" in json_data and "op" in json_data["ops"][0]:
if "value" in json_data["ops"][0] and isinstance(json_data["ops"][0]["value"], str):
yield f"data: {repr(json_data['ops'][0]['value'].encode('utf-8'))}\n\n"
else:
pass
elif "content" in json_data["choices"][0]["delta"]:
yield f"data: {repr(json_data['choices'][0]['delta']['content'].encode('utf-8'))}\n\n"
except Exception as e:
yield f"data: {repr(json_str.encode('utf-8'))}\n\n"
yield "data: [DONE]\n\n"


class CodeTransService:
def __init__(self, host="0.0.0.0", port=8000):
self.host = host
self.port = port
ServiceOrchestrator.align_inputs = align_inputs
ServiceOrchestrator.align_outputs = align_outputs
ServiceOrchestrator.align_generator = align_generator
self.megaservice = ServiceOrchestrator()
self.endpoint = str(MegaServiceEndpoint.CODE_TRANS)

Expand All @@ -31,8 +105,10 @@ def add_remote_service(self):
name="llm",
host=LLM_SERVICE_HOST_IP,
port=LLM_SERVICE_PORT,
api_key=OPENAI_API_KEY,
endpoint="/v1/chat/completions",
use_remote_service=True,
service_type=ServiceType.LLM,
)
self.megaservice.add(llm)

Expand Down
31 changes: 27 additions & 4 deletions CodeTrans/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,10 +137,33 @@ Key parameters are configured via environment variables set before running `dock

In the context of deploying a CodeTrans pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).

| File | Description |
| -------------------------------------- | ----------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database |
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default |
| File | Description |
| -------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database. |
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default. |
| [compose_remote.yaml](./compose_remote.yaml) | The LLM used is hosted on a remote server and an endpoint is used to access this model. vLLM is the serving framework. Additional environment variables need to be set before running. See [instructions](#running-llm-models-with-remote-endpoints) below. |

### Running LLM models with remote endpoints

When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to [Intel® AI for Enterprise Inference](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/enterprise-inference.html) offerings.

Set the following environment variables.

- `REMOTE_ENDPOINT` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.example.com). **Note:** If the API for the models does not use LiteLLM, the second part of the model card needs to be appended to the URL. For example, set `REMOTE_ENDPOINT` to https://api.example.com/Llama-3.3-70B-Instruct if the model card is `meta-llama/Llama-3.3-70B-Instruct`.
- `API_KEY` is the access token or key to access the model(s) on the server.
- `LLM_MODEL_ID` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`.

```bash
export REMOTE_ENDPOINT=<https-endpoint-of-remote-server>
export API_KEY=<your-api-key>
export LLM_MODEL_ID=<model-card>
```

After setting these environment variables, run `docker compose` with `compose_remote.yaml`:

```bash
docker compose -f compose_remote.yaml up -d
```

## Validate Microservices

Expand Down
72 changes: 72 additions & 0 deletions CodeTrans/docker_compose/intel/cpu/xeon/compose_remote.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

services:
llm:
image: ${REGISTRY:-opea}/llm-textgen:${TAG:-latest}
container_name: codetrans-xeon-llm-server
ports:
- "9000:9000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LLM_ENDPOINT: ${REMOTE_ENDPOINT}
LLM_MODEL_ID: ${LLM_MODEL_ID}
OPENAI_API_KEY: ${API_KEY}
LLM_COMPONENT_NAME: ${LLM_COMPONENT_NAME}
HF_TOKEN: ${HF_TOKEN}
restart: unless-stopped
codetrans-xeon-backend-server:
image: ${REGISTRY:-opea}/codetrans:${TAG:-latest}
container_name: codetrans-xeon-backend-server
depends_on:
- llm
ports:
- "${BACKEND_SERVICE_PORT:-7777}:7777"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
ipc: host
restart: always
codetrans-xeon-ui-server:
image: ${REGISTRY:-opea}/codetrans-ui:${TAG:-latest}
container_name: codetrans-xeon-ui-server
depends_on:
- codetrans-xeon-backend-server
ports:
- "${FRONTEND_SERVICE_PORT:-5173}:5173"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- BASE_URL=${BACKEND_SERVICE_ENDPOINT}
ipc: host
restart: always
codetrans-xeon-nginx-server:
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
container_name: codetrans-xeon-nginx-server
depends_on:
- codetrans-xeon-backend-server
- codetrans-xeon-ui-server
ports:
- "${NGINX_PORT:-80}:80"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP}
- FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT}
- BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME}
- BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP}
- BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT}
ipc: host
restart: always

networks:
default:
driver: bridge
Loading