-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
System Info
Description
During tool calls, OpenAI Chat Completions responses sometimes include Harmony-format control tokens in assistant content and Harmony-only fields in JSON (e.g., reasoning) when hosting the GPT-OSS 120B model via trt-llm
Example:
{ "model": "gpt-oss-20b", "messages": [ {"role": "user", "content": "Search for latest policy doc title"} ], "tools": [ { "type": "function", "function": { "name": "search", "description": "How is the weather today", "parameters": { "type": "object", "properties": { "q": { "type": "string" } }, "required": ["q"] } } } ], "tool_choice": "auto" }
Would result in a response:
<|channel|>commentary<|message|>{ "q": "search the web" }
The harmony tokens end up leaking arbitrarily during tool calls and mostly in commentary channels and sometimes in the analysis channels as well.
Triton Information
Version: using TensorRT-LLM OpenAI server (trtllm-serve).
TensorRT-LLM OpenAI server image versions: 1.2.0rc0 and 1.2.0rc0.post1.
Container vs build: Using the official container images (no custom build).
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Run TensorRT-LLM OpenAI HTTP server (trtllm-serve serve) with gpt-oss-120b. No custom stop tokens or output filters configured.
POST to /v1/chat/completions with:
Messages that trigger tool planning/execution.
A tools schema (tools / tool_choice) to enable tool calling.
Observe responses (both streaming and non-streaming):
Assistant content contains Harmony markers (e.g., <|channel|>commentary<|message|>{...}).
Non-streaming JSON sometimes includes reasoning
Here is an example request:
{ "model": "gpt-oss-20b", "messages": [ {"role": "user", "content": "Search for latest policy doc title"} ], "tools": [ { "type": "function", "function": { "name": "search", "description": "How is the weather today", "parameters": { "type": "object", "properties": { "q": { "type": "string" } }, "required": ["q"] } } } ], "tool_choice": "auto" }
Expected behavior
Would result in a response:
{
"q": "search the web for weather"
}`
actual behavior
Would result in a response:
<|channel|>commentary<|message|>{ "q": "search the web for weather" }
additional notes
Model description:
Models: openai/gpt-oss-120b, openai/gpt-oss-20b.
Served via TensorRT-LLM OpenAI server; downloaded at container start; no request/response mutation layer.
Inputs: OpenAI Chat Completions payloads with messages and tools.
Outputs: OpenAI Chat Completions (expected clean JSON and content).
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.