Skip to content

fix(ai-protocols): flatten structured message content in the protocol layer#13634

Merged
nic-6443 merged 7 commits into
apache:masterfrom
nic-6443:fix/ai-prompt-guard-structured-content
Jul 2, 2026
Merged

fix(ai-protocols): flatten structured message content in the protocol layer#13634
nic-6443 merged 7 commits into
apache:masterfrom
nic-6443:fix/ai-prompt-guard-structured-content

Conversation

@nic-6443

@nic-6443 nic-6443 commented Jun 30, 2026

Copy link
Copy Markdown
Member

Description

ai-prompt-guard returns 500 when a chat message's content is a structured array instead of a plain string.

OpenAI Chat Completions allows messages[].content to be either a string or an array of typed parts, e.g.:

{ "role": "user", "content": [ { "type": "text", "text": "hello" } ] }

The root cause is in the protocol layer: openai-chat.get_messages() returned body.messages verbatim, so a table-valued content leaked to consumers. ai-prompt-guard then concatenates message content with table.concat, which raises:

ai-prompt-guard.lua: invalid value (table) at index N in table for 'concat'

The canonical {role, content} contract is that get_messages returns content already flattened to a plain string. This PR makes every adapter honor that contract and keeps the flattening in exactly one place per adapter:

  • openai-chat / anthropic-messages: get_messages now reuses the adapter's existing append_message_text helper instead of re-implementing the flatten loop.
  • bedrock-converse: adds append_block_texts, replacing six copies of the content-block text extraction across extract_response_text, extract_request_content, extract_user_content and get_messages.
  • openai-responses: adds append_item_text shared by extract_request_content, extract_user_content and get_messages. This also fixes get_messages silently dropping structured (array) input content parts — the same class of bug, previously masked because it dropped the content instead of crashing.

Non-text parts (e.g. image_url) are dropped, consistent across all adapters. Every get_messages consumer (ai-prompt-guard, ai-lakera-guard, ai-cache) then receives flattened text without re-implementing the extraction; ai-lakera-guard.normalize_messages is reduced to filtering empty turns.

Regression tests cover structured content in conversation history, the latest user message, mixed text/non-text parts (openai-chat), and structured input parts (Responses API); they fail before the fix and pass after. The ai-prompt-guard, ai-lakera-guard, and ai-cache suites are green locally.

Which issue(s) this PR fixes:

Fixes #

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible

OpenAI Chat allows messages[].content to be either a plain string or an
array of typed parts (e.g. [{type="text", text="..."}]). The plugin
collected msg.content as-is and then called table.concat, raising
"invalid value (table) ... for 'concat'" and returning 500 whenever an
inspected message carried array content.

Flatten each message's content (string, or the text parts of an array)
before concatenation, matching how the protocol adapters already extract
text. Add regression tests for array content in conversation history, in
the latest user message, and mixed text/non-text parts.
Copilot AI review requested due to automatic review settings June 30, 2026 10:21
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jun 30, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a Lua runtime crash in the ai-prompt-guard plugin when inspecting OpenAI Chat-style messages that use structured (array) messages[].content, by flattening text parts before concatenation.

Changes:

  • Add a helper to flatten message content (string or typed-parts array) into plain text prior to table.concat.
  • Update prompt aggregation to use the new flattening helper.
  • Add regression tests covering structured content in history and the latest user message (including mixed text + non-text parts).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
apisix/plugins/ai-prompt-guard.lua Flattens structured message content into text before concatenation to prevent table.concat runtime errors.
t/plugin/ai-prompt-guard.t Adds tests ensuring structured content is scanned/denied correctly without crashing under different match modes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apisix/plugins/ai-prompt-guard.lua Outdated
nic-6443 added 2 commits June 30, 2026 18:29
Add a case where the deny word lives only in a non-text part (image_url)
to lock in that only text parts are inspected. Also declare
`local type = type` so the content-flatten helper passes the
localize-globals style check.
…_messages

Move the OpenAI Chat structured-content handling out of ai-prompt-guard and
into openai-chat.get_messages, so it returns canonical string content like
every other adapter (anthropic, bedrock, responses, embeddings) already does.

get_messages previously returned body.messages verbatim, so a message whose
content is an array of typed parts (e.g. [{type="text", text="..."}]) leaked a
table to consumers. ai-prompt-guard then hit "invalid value (table) ... for
'concat'" (500). Flattening in the protocol layer keeps protocol details out of
the plugins: ai-prompt-guard, ai-lakera-guard and ai-cache all get flattened
text without duplicating the logic.
@nic-6443 nic-6443 changed the title fix(ai-prompt-guard): handle structured message content fix(ai-protocols): flatten structured content in openai-chat get_messages Jul 1, 2026
nic-6443 added 3 commits July 1, 2026 10:36
openai-chat.get_messages now flattens content like the other adapters, so no
adapter returns body.messages verbatim anymore. normalize_messages stays as
idempotent defense-in-depth; reword the comment to match.
…lize_messages

openai-chat.get_messages now flattens content to a string like every other
adapter, so normalize_messages no longer needs its own text-part extraction
(it only ever runs on get_messages output). Reduce it to the Lakera-specific
filtering: keep role-tagged messages with non-empty string content.
…oss adapters

Make the protocol layer the single place that flattens structured message
content into plain strings, removing the duplicated flatten loops that the
get_messages fix would otherwise leave scattered across adapters:

- openai-chat / anthropic-messages: get_messages reuses the existing
  append_message_text helper instead of re-implementing the loop
- bedrock-converse: add append_block_texts, replacing six copies of the
  content-block text extraction
- openai-responses: add append_item_text shared by extract_request_content,
  extract_user_content and get_messages; this also fixes get_messages
  silently dropping structured (array) input content parts

Add an ai-prompt-guard test for Responses API structured content.
@nic-6443 nic-6443 changed the title fix(ai-protocols): flatten structured content in openai-chat get_messages fix(ai-protocols): flatten structured message content in the protocol layer Jul 1, 2026
… contract

get_messages always returns a table of {role, content} tables it constructs
itself, with content already flattened to a string, so normalize_messages
keeps only the two live filters: turns without a role (adapters pass the
client role through verbatim) and empty content that Lakera /v2/guard has
nothing to scan. The re-copy into a fresh table is also unnecessary.
@nic-6443 nic-6443 merged commit 11dfb18 into apache:master Jul 2, 2026
19 checks passed
@nic-6443 nic-6443 deleted the fix/ai-prompt-guard-structured-content branch July 2, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants