Skip to content

fix(#161): re-parse criteria when LLM returns stringified JSON dict#552

Open
drewdrewthis wants to merge 1 commit into
mainfrom
issue161/fix-judge-criteria-string
Open

fix(#161): re-parse criteria when LLM returns stringified JSON dict#552
drewdrewthis wants to merge 1 commit into
mainfrom
issue161/fix-judge-criteria-string

Conversation

@drewdrewthis
Copy link
Copy Markdown
Collaborator

Problem

Some LLMs serialise nested objects (like the criteria map) as a quoted JSON string instead of an inline dict. When the JudgeAgent calls .values() on that string it blows up:

AttributeError: 'str' object has no attribute 'values'

Closes #161.

Fix

After json.loads(tool_call.function.arguments), check if criteria_verdicts is a str. If so, try a second json.loads() pass. On decode failure fall back to {} so the judge still produces a result rather than crashing the run.

Test

test_judge_agent_handles_criteria_as_json_string in tests/test_judge_agent.py exercises the exact payload shape that triggered the bug — criteria returned as "{\"test_criterion\": true}" (a JSON string, not a dict). The test was failing with AttributeError before the fix and passes after.

Checklist

  • Regression test added
  • No behaviour change when LLM returns a normal dict
  • Graceful fallback ({}) when the string is not valid JSON

🤖 Generated with Claude Code

Some LLMs serialise nested objects (like the criteria map) as a quoted
JSON string instead of an inline dict.  criteria_verdicts.values() then
throws AttributeError: 'str' object has no attribute 'values'.

Re-parse with json.loads() when a string is detected; fall back to {}
on decode failure so the judge still produces a result rather than
crashing the test run.

Regression test added: test_judge_agent_handles_criteria_as_json_string
exercises the exact payload shape that triggered the bug.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@drewdrewthis drewdrewthis added grinding Grinder is actively managing this PR low-risk-change PR qualifies as low-risk per policy and can be merged without manual review labels May 25, 2026
@github-actions github-actions Bot removed the low-risk-change PR qualifies as low-risk per policy and can be merged without manual review label May 25, 2026
@drewdrewthis
Copy link
Copy Markdown
Collaborator Author

[grinder] READY for human review

CI: green (zero failing, zero pending)
ACs: met — JudgeAgent now re-parses criteria when LLM returns a JSON string instead of a dict; graceful fallback to {} on parse failure; regression test test_judge_agent_handles_criteria_as_json_string verifies the exact payload shape that triggered the bug
Threads: zero unresolved
Note: LLM evaluator declined auto-approve (modifies runtime parsing logic in JudgeAgent) — human review required

Verified by:
`command gh pr checks 552` → all checks pass or skipping; test (3.12) pass 7m31s, evaluate pass, javascript-complete pass
regression test in worktree: `PASSED` (.worktrees/issue161)

@drewdrewthis drewdrewthis added pr-ready and removed grinding Grinder is actively managing this PR labels May 25, 2026
@github-actions github-actions Bot added the low-risk-change PR qualifies as low-risk per policy and can be merged without manual review label May 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure.

  • Scope: Re-parse 'criteria' if it's returned as a JSON string in JudgeAgent (fallback to {} on decode failure) and add a regression test exercising this case.
  • Exclusions confirmed: no changes to auth, security settings, database schema, business-critical logic, or external integrations.
  • Classification: low-risk-change under the documented policy.

The PR only adjusts how the JudgeAgent parses LLM-returned arguments (re-parsing a stringified JSON 'criteria' and falling back to an empty dict) and adds a regression test. It does not touch authentication/authorization, secrets/encryption, database schemas/migrations, business-critical logic, or external integrations, so it fits the low-risk criteria and is easy to revert if needed.

An approving review has been submitted by automation. The PR may merge once required CI checks pass.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by automation: PR qualifies as low-risk-change under the documented policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

low-risk-change PR qualifies as low-risk per policy and can be merged without manual review pr-ready

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JudgeAgent AttributeError when LLM returns criteria as string instead of dict

1 participant