Skip to content

refactor(ai-aws-content-moderation): moderate decoded LLM content in access phase#13647

Open
shreemaan-abhishek wants to merge 2 commits into
apache:masterfrom
shreemaan-abhishek:refactor/ai-aws-content-moderation-access
Open

refactor(ai-aws-content-moderation): moderate decoded LLM content in access phase#13647
shreemaan-abhishek wants to merge 2 commits into
apache:masterfrom
shreemaan-abhishek:refactor/ai-aws-content-moderation-access

Conversation

@shreemaan-abhishek

@shreemaan-abhishek shreemaan-abhishek commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Description

Refactor ai-aws-content-moderation to be structured like its sibling ai-aliyun-content-moderation.

Previously the plugin ran in the rewrite phase (before ai-proxy) and sent the raw HTTP request body to AWS Comprehend. As a result Comprehend scored the undecoded JSON envelope (e.g. the literal escaped string "content":"toxic" and {"model":...,"messages":[...]}), while the upstream LLM acts on the decoded prompt, so the two saw different text.

What changed

  • Move the plugin from the rewrite phase to the access phase (priority 1050 -> 1031, below ai-proxy's 1040), so it runs after ai-proxy and can reuse ctx.ai_client_protocol and ctx.picked_ai_instance.
  • Make it protocol-aware: parse the JSON body and extract the LLM-visible prompt content via the detected protocol (ai-protocols), sending only the normalized, decoded content to Comprehend. This is what the sibling ai-aliyun-content-moderation plugin does.
  • Guard on ctx.picked_ai_instance via fail_mode, so the plugin reports clearly when it is used without ai-proxy/ai-proxy-multi.
  • Return a provider-compatible deny response (via ai-protocols) instead of a raw text body, so AI clients are not broken. Adds check_request, deny_code (default 200) and deny_message options.

The AWS Comprehend decision model (moderation_categories per-category thresholds and the overall moderation_threshold) is unchanged.

Behavior change

The plugin now must be used together with ai-proxy/ai-proxy-multi, consistent with ai-aliyun-content-moderation and with how the plugin is documented. A request that does not pass through ai-proxy is governed by fail_mode (default skip, i.e. passes through unchecked; set fail_mode: error to reject). This supersedes #13528.

Which issue(s) this PR fixes:

Fixes #

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

…access phase

Restructure the plugin to match ai-aliyun-content-moderation: run in the
access phase after ai-proxy, extract the decoded LLM prompt content via the
detected client protocol instead of sending the raw request body to
Comprehend, guard on ctx.picked_ai_instance via fail_mode, and return a
provider-compatible deny response. Adds check_request, deny_code and
deny_message options.
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jul 2, 2026
1030 collided with ai-rate-limiting; move to 1031 (still below ai-proxy) and
update the priority-ordered plugin list in t/admin/plugins.t.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the ai-aws-content-moderation plugin to run after ai-proxy in the access phase, so it can moderate the decoded, protocol-normalized LLM prompt content (instead of scoring the raw JSON envelope). It also updates the deny behavior to return protocol/provider-compatible responses, and aligns tests/docs with the new usage model (co-deployed with ai-proxy/ai-proxy-multi).

Changes:

  • Move moderation logic to access (priority 10501031) and extract LLM-visible text via ai-protocols.
  • Add request moderation controls and provider-compatible deny responses (check_request, deny_code, deny_message).
  • Update tests and documentation to use ai-proxy and validate the refactored behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
apisix/plugins/ai-aws-content-moderation.lua Refactors plugin to access, extracts decoded prompt content via ai-protocols, and builds provider-compatible deny bodies.
t/plugin/ai-aws-content-moderation.t Updates core moderation tests to run with ai-proxy and JSON LLM chat bodies; adjusts assertions for new deny response format.
t/plugin/ai-aws-content-moderation2.t Updates “Comprehend unreachable” test to run with ai-proxy and JSON chat requests.
t/plugin/ai-aws-content-moderation-secrets.t Updates secret-resolution tests to run with ai-proxy and JSON chat requests; keeps SigV4 credential propagation validation.
t/admin/plugins.t Reorders plugin listing to reflect ai-proxy adjacency/dependency expectations.
docs/en/latest/plugins/ai-aws-content-moderation.md Documents protocol-aware extraction, ai-proxy co-usage, and provider-compatible deny responses; adds new config options.
docs/zh/latest/plugins/ai-aws-content-moderation.md Same as EN docs: protocol-aware behavior, ai-proxy co-usage, deny response format, and new config options.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@membphis membphis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Legacy configurations without ai-proxy now fail open

Before this PR, ai-aws-content-moderation ran in the rewrite phase and moderated the raw request body for routes that used only this plugin. After the refactor, access() returns through binding.on_unsupported when ctx.picked_ai_instance is absent; with the existing default fail_mode=skip, the request is passed through unchecked. The new test also asserts that a toxic request without ai-proxy returns 200.

This is a content-moderation policy regression for existing deployments that already use this plugin without ai-proxy or ai-proxy-multi. Please keep a legacy raw-body moderation fallback, or make this migration fail closed / explicit opt-in with a documented compatibility path.

@membphis membphis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR. For the legacy configurations without ai-proxy case, I recommend handling the compatibility and fail-open behavior in a separate follow-up PR, with an explicit migration path for standalone ai-aws-content-moderation usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants