fix(#161): re-parse criteria when LLM returns stringified JSON dict#552
fix(#161): re-parse criteria when LLM returns stringified JSON dict#552drewdrewthis wants to merge 1 commit into
Conversation
Some LLMs serialise nested objects (like the criteria map) as a quoted
JSON string instead of an inline dict. criteria_verdicts.values() then
throws AttributeError: 'str' object has no attribute 'values'.
Re-parse with json.loads() when a string is detected; fall back to {}
on decode failure so the judge still produces a result rather than
crashing the test run.
Regression test added: test_judge_agent_handles_criteria_as_json_string
exercises the exact payload shape that triggered the bug.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[grinder] READY for human review CI: green (zero failing, zero pending) Verified by: |
|
Automated low-risk assessment This PR was evaluated against the repository's Low-Risk Pull Requests procedure.
An approving review has been submitted by automation. The PR may merge once required CI checks pass. |
Problem
Some LLMs serialise nested objects (like the
criteriamap) as a quoted JSON string instead of an inline dict. When the JudgeAgent calls.values()on that string it blows up:Closes #161.
Fix
After
json.loads(tool_call.function.arguments), check ifcriteria_verdictsis astr. If so, try a secondjson.loads()pass. On decode failure fall back to{}so the judge still produces a result rather than crashing the run.Test
test_judge_agent_handles_criteria_as_json_stringintests/test_judge_agent.pyexercises the exact payload shape that triggered the bug —criteriareturned as"{\"test_criterion\": true}"(a JSON string, not a dict). The test was failing withAttributeErrorbefore the fix and passes after.Checklist
{}) when the string is not valid JSON🤖 Generated with Claude Code