Skip to content

ChatQnA Example with OpenAI-Compatible Endpoint #2091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
985cddd
Compose file for ChatQnA example with openai-like endpoint
edlee123 Jun 10, 2025
eec42f2
Adding README.md for ChatQnA + endpoint
edlee123 Jun 10, 2025
39962f6
In chatqna.py handle null openai api response since UI would show sho…
edlee123 Jun 10, 2025
d227878
Update ChatQnA/docker_compose/intel/cpu/xeon/compose_endpoint_openai.…
edlee123 Jun 10, 2025
139972c
Add tests for different input formats (#2006)
ZePan110 May 29, 2025
9fc1235
Fix security issues in workflows (#1977)
ZePan110 May 29, 2025
70db6c5
Integrate MultimodalQnA set_env to ut scripts. (#1965)
ZePan110 May 29, 2025
7ada28f
Optimize benchmark scripts (#1949)
chensuyue May 30, 2025
8c80b08
Fix permissions error. (#2008)
ZePan110 May 30, 2025
a943345
Build comps-base:ci for AgentQnA test (#2010)
chensuyue Jun 3, 2025
b63bdb3
Stop CI test on rocm due to lack of test machine (#2017)
chensuyue Jun 3, 2025
2a8f3fb
Fix workflow permission issues. (#2018)
ZePan110 Jun 3, 2025
34d40dc
Refine the README, folder/file hierarchy and test file for FinanceAge…
MSCetin37 Jun 3, 2025
a0f7ea0
Add code owners. (#2022)
joshuayao Jun 3, 2025
9776593
Fix MultimodalQnA UT issues (#2011)
ZePan110 Jun 3, 2025
31cd99f
update secrets token name for AgentQnA. (#2023)
ZePan110 Jun 5, 2025
9b089dd
update secrets token name for AudioQnA. (#2024)
ZePan110 Jun 5, 2025
313f671
update secrets token name for AvatarChatbot and DBQnA. (#2030)
ZePan110 Jun 5, 2025
39b53a2
update secrets token name for ChatQnA. (#2029)
ZePan110 Jun 5, 2025
d18bc9b
update secrets token name for CodeGen and CodeTrans (#2031)
ZePan110 Jun 5, 2025
0798004
[DocSum] Aligned the output format (#1948)
Zhenzhong1 Jun 6, 2025
229f2b1
update secrets token name for DocIndexRetriever. (#2035)
ZePan110 Jun 6, 2025
4d0b5c4
update secrets token name for EdgeCraftRag, FinanceAgent, GraphRAG an…
ZePan110 Jun 6, 2025
fdbb0bf
update secrets token name for ProductivitySuite, RerankFinetuning, Se…
ZePan110 Jun 6, 2025
b27a6d3
update secrets token name for InstructionTuning, MultimodalQnA and Wo…
ZePan110 Jun 6, 2025
a797945
update secrets token name for DocSum. (#2036)
ZePan110 Jun 6, 2025
81a8841
update secrets token name for VideoQnA and VisualQnA (#2040)
ZePan110 Jun 6, 2025
d91de60
Fix shellcheck issues and update secrets TOKEN name (#2043)
ZePan110 Jun 9, 2025
1e20459
add new feature for EC-RAG (#2013)
Yongbozzz Jun 9, 2025
09ceb36
[CodeGen] Aligned the output format and fixed acc benchmark issues. (…
Zhenzhong1 Jun 9, 2025
1bb9fa4
Update vLLM version to v0.9.0.1 (#1921)
CICD-at-OPEA Jun 9, 2025
c0f5f56
Add if validation to retrieval tool (#2007)
ezelanza Jun 9, 2025
35ee101
Refine documents for DBQnA (#2034)
ZePan110 Jun 10, 2025
a9cb8e9
Restore secrets for _helm-e2e.yml (#2055)
ZePan110 Jun 10, 2025
4a313a9
Update EdgeCraftRAG README and ENV (#2052)
Yongbozzz Jun 10, 2025
506ead3
Fix TGI image HF_TOKEN environment variable rename (#2059)
xiguiw Jun 11, 2025
7575316
Release v1.3 ChatQnA OOB benchmark data (#2041)
chensuyue Jun 11, 2025
9d94879
Fix CodeGen non stream output issue (#2058)
xiguiw Jun 11, 2025
fdeadf9
Fixed typos in readme per copilot review
edlee123 Jun 24, 2025
bf18321
Merge branch 'chatqna_w_endpoints' of github.com:edlee123/GenAIExampl…
edlee123 Jun 24, 2025
532cb66
Reverting accidently modified CodeGen in merge
edlee123 Jun 24, 2025
c8b14e3
Reverting accidently modified CodeGen in merge
edlee123 Jun 24, 2025
2a0d757
Merge branch 'main' into chatqna_w_endpoints
edlee123 Jun 26, 2025
ad679a6
Merge branch 'main' into chatqna_w_endpoints
edlee123 Jun 28, 2025
c097939
Use OPEA CustomLogger style
edlee123 Jul 2, 2025
0bc4fd4
Merge branch 'chatqna_w_endpoints' of github.com:edlee123/GenAIExampl…
edlee123 Jul 2, 2025
9227eae
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 2, 2025
16641a2
Merge branch 'main' into chatqna_w_endpoints
edlee123 Jul 2, 2025
7c85779
Merge branch 'main' into chatqna_w_endpoints
edlee123 Jul 3, 2025
81a2e02
Merge branch 'main' into chatqna_w_endpoints
edlee123 Jul 8, 2025
204702e
Merge branch 'main' into chatqna_w_endpoints
edlee123 Jul 22, 2025
0184157
Updated compose_endpoint_openai.yaml to use newer text-embeddings-inf…
edlee123 Jul 23, 2025
f000373
Merge branch 'chatqna_w_endpoints' of github.com:edlee123/GenAIExampl…
edlee123 Jul 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 69 additions & 19 deletions ChatQnA/chatqna.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@

import argparse
import json
import logging
import os
import re

from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
from comps import CustomLogger, MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
from comps.cores.mega.utils import handle_message
from comps.cores.proto.api_protocol import (
ChatCompletionRequest,
Expand All @@ -20,6 +21,10 @@
from fastapi.responses import StreamingResponse
from langchain_core.prompts import PromptTemplate

logger = CustomLogger(__name__)
log_level = logging.DEBUG if os.getenv("LOGFLAG", "").lower() == "true" else logging.INFO
logging.basicConfig(level=log_level, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")


class ChatTemplate:
@staticmethod
Expand Down Expand Up @@ -62,6 +67,10 @@ def generate_rag_prompt(question, documents):


def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
logger.debug(
f"Aligning inputs for service: {self.services[cur_node].name}, type: {self.services[cur_node].service_type}"
)

if self.services[cur_node].service_type == ServiceType.EMBEDDING:
inputs["inputs"] = inputs["text"]
del inputs["text"]
Expand All @@ -83,6 +92,9 @@ def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **k
# next_inputs["repetition_penalty"] = inputs["repetition_penalty"]
next_inputs["temperature"] = inputs["temperature"]
inputs = next_inputs

# Log the aligned inputs (be careful with sensitive data)
logger.debug(f"Aligned inputs for {self.services[cur_node].name}: {type(inputs)}")
return inputs


Expand Down Expand Up @@ -123,7 +135,9 @@ def align_outputs(self, data, cur_node, inputs, runtime_graph, llm_parameters_di
elif input_variables == ["question"]:
prompt = prompt_template.format(question=data["initial_query"])
else:
print(f"{prompt_template} not used, we only support 2 input variables ['question', 'context']")
logger.warning(
f"{prompt_template} not used, we only support 2 input variables ['question', 'context']"
)
prompt = ChatTemplate.generate_rag_prompt(data["initial_query"], docs)
else:
prompt = ChatTemplate.generate_rag_prompt(data["initial_query"], docs)
Expand Down Expand Up @@ -152,7 +166,7 @@ def align_outputs(self, data, cur_node, inputs, runtime_graph, llm_parameters_di
elif input_variables == ["question"]:
prompt = prompt_template.format(question=prompt)
else:
print(f"{prompt_template} not used, we only support 2 input variables ['question', 'context']")
logger.warning(f"{prompt_template} not used, we only support 2 input variables ['question', 'context']")
prompt = ChatTemplate.generate_rag_prompt(prompt, reranked_docs)
else:
prompt = ChatTemplate.generate_rag_prompt(prompt, reranked_docs)
Expand All @@ -171,29 +185,65 @@ def align_outputs(self, data, cur_node, inputs, runtime_graph, llm_parameters_di


def align_generator(self, gen, **kwargs):
# OpenAI response format
# b'data:{"id":"","object":"text_completion","created":1725530204,"model":"meta-llama/Meta-Llama-3-8B-Instruct","system_fingerprint":"2.0.1-native","choices":[{"index":0,"delta":{"role":"assistant","content":"?"},"logprobs":null,"finish_reason":null}]}\n\n'
for line in gen:
line = line.decode("utf-8")
start = line.find("{")
end = line.rfind("}") + 1
"""Aligns the generator output to match ChatQnA's format of sending bytes.

Handles different LLM output formats (TGI, OpenAI) and properly filters
empty or null content chunks to avoid UI display issues.
"""
# OpenAI response format example:
# b'data:{"id":"","object":"text_completion","created":1725530204,"model":"meta-llama/Meta-Llama-3-8B-Instruct",
# "system_fingerprint":"2.0.1-native","choices":[{"index":0,"delta":{"role":"assistant","content":"?"},
# "logprobs":null,"finish_reason":null}]}\n\n'

json_str = line[start:end]
for line in gen:
try:
# sometimes yield empty chunk, do a fallback here
line = line.decode("utf-8")
start = line.find("{")
end = line.rfind("}") + 1

# Skip lines with invalid JSON structure
if start == -1 or end <= start:
logger.debug("Skipping line with invalid JSON structure")
continue

json_str = line[start:end]

# Parse the JSON data
json_data = json.loads(json_str)

# Handle TGI format responses
if "ops" in json_data and "op" in json_data["ops"][0]:
if "value" in json_data["ops"][0] and isinstance(json_data["ops"][0]["value"], str):
yield f"data: {repr(json_data['ops'][0]['value'].encode('utf-8'))}\n\n"
else:
pass
elif (
json_data["choices"][0]["finish_reason"] != "eos_token"
and "content" in json_data["choices"][0]["delta"]
):
yield f"data: {repr(json_data['choices'][0]['delta']['content'].encode('utf-8'))}\n\n"
# Empty value chunks are silently skipped

# Handle OpenAI format responses
elif "choices" in json_data and len(json_data["choices"]) > 0:
# Only yield content if it exists and is not null
if (
"delta" in json_data["choices"][0]
and "content" in json_data["choices"][0]["delta"]
and json_data["choices"][0]["delta"]["content"] is not None
):
content = json_data["choices"][0]["delta"]["content"]
yield f"data: {repr(content.encode('utf-8'))}\n\n"
# Null content chunks are silently skipped
elif (
"delta" in json_data["choices"][0]
and "content" in json_data["choices"][0]["delta"]
and json_data["choices"][0]["delta"]["content"] is None
):
logger.debug("Skipping null content chunk")

except json.JSONDecodeError as e:
# Log the error with the problematic JSON string for better debugging
logger.error(f"JSON parsing error in align_generator: {e}\nProblematic JSON: {json_str[:200]}")
# Skip sending invalid JSON to avoid UI issues
continue
except Exception as e:
yield f"data: {repr(json_str.encode('utf-8'))}\n\n"
logger.error(f"Unexpected error in align_generator: {e}, line snippet: {line[:100]}...")
# Skip sending to avoid UI issues
continue
yield "data: [DONE]\n\n"


Expand Down
Loading
Loading