fix(#496): stop str()-coercing multimodal content in OTel trace + red-team refusal detection#546
Conversation
…-team refusal detection [grinder] Site A (scenario_executor._broadcast_message): pass structured content (list) directly to _trace.update instead of converting with str(), which mangled multimodal voice messages to Python repr. Site B (red_team_agent): add _extract_text() helper that concatenates the 'text' fields of all text-type content parts. _get_last_assistant_content / _get_last_user_content now call _extract_text() instead of str(), so _detect_refusal correctly classifies hard/soft refusals in voice scenarios. Closes #496 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…multimodal content Pass a plain str to LangWatch trace.update() instead of wrapping in a dict (not a valid SpanInputOutput type) or calling str() on a list (produces unreadable Python repr). The new _extract_text_content helper joins text parts from structured content lists. Fixes Pyright errors at scenario_executor.py:319,327. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[grinder] READY for human review CI: green (zero failing, zero pending) Verified by: |
…n bump) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…o_executor) AC3 requires tests for both str()-coercion sites. Site B (RedTeamAgent._extract_text) was already covered; this adds TestExtractTextContent for the module-level helper in scenario_executor.py — plain-string passthrough, multimodal extraction, audio-only (empty), and non-list fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[grinder] Re-opening for AC gap: /review found that AC3 ("Unit tests covering each site with a multimodal-list input") was only half-satisfied. TestExtractText covered Site B (RedTeamAgent._extract_text) but Site A (_extract_text_content in scenario_executor.py) had zero test coverage. Added TestExtractTextContent class (4 tests) in commit 1b60c44. Waiting for CI to re-verify before re-marking pr-ready. |
|
[grinder] READY for human review CI: green (zero failing, zero pending — all 17 checks pass/skip) Verified by: |
…cion-multimodal # Conflicts: # python/uv.lock
|
Automated low-risk assessment This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.
This PR requires a manual review before merging. |
Summary
Two sites coerced multimodal message content to
str(list), producing Python repr instead of structured data or useful text.Site A —
scenario_executor._broadcast_message(OTel trace input/output):str(message["content"])→"[{'type': 'text', 'text': '...'}, {'type': 'audio', ...}]"in LangWatch tracelistcontent directly; coerces tostronly for plain-text messagesSite B —
red_team_agent._get_last_assistant_content/_get_last_user_content(refusal detection):str(content)→ Python repr → substring match against hard/soft refusal patterns silently fails for voice agents_extract_text()helper concatenatestextparts from multimodal lists; refusal detection works correctly for voice repliesTest plan
uv run pytest tests/test_red_team_agent.py::TestExtractText -vCloses #496
🤖 Generated with Claude Code