fix: preflight check now validates reasoning_content for thinking models#2420
fix: preflight check now validates reasoning_content for thinking models#2420
Conversation
Fixes #2419 Models with enable_thinking=true (e.g. Nemotron) put output into reasoning_content rather than content. The preflight check now checks both fields so thinking models pass correctly. Co-authored-by: openhands <openhands@all-hands.dev>
API breakage checks (Griffe)Result: Passed |
Agent server REST API breakage checks (OpenAPI)Result: Failed Log excerpt (first 1000 characters) |
|
@OpenHands This worked with NVIDIA Nemotron-3 Super 120B but when testing with Claude Sonnet 4.5 we got: ✗ Claude Sonnet 4.5: AttributeError - 'Message' object has no attribute 'reasoning_content' Please fix it so it works with models that don't have reasoning_content too. |
|
I'm on it! juanmichelini can track my progress at all-hands.dev |
…models Models like Claude Sonnet 4.5 don't have reasoning_content on their Message object, causing AttributeError. Use getattr with a default of None to safely handle both thinking and non-thinking model responses. Also fix test_empty_response (which was passing incorrectly since MagicMock auto-generates truthy attributes) and add two new tests: - test_thinking_model_success: reasoning_content only, no content - test_model_without_reasoning_content_attribute: no attr at all Co-authored-by: openhands <openhands@all-hands.dev>
|
Fixed in bf48dd9. The issue was that the original fix used direct attribute access () which throws Changed to
Also fixed the |
Summary of ChangesProblem addressed: PR #2420's fix for thinking models broke non-thinking models (like Claude Sonnet 4.5) with Changes made (commit
|
|
good this works, tested for glm-4.7, claude-sonnet-4-5-20250929, qwen3.5-flash, claude-sonnet-4-6, nemotron-3-super-120b-a12b here https://github.com/OpenHands/software-agent-sdk/actions/runs/23074729849 |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean fix for a real problem. The getattr fallback is appropriate here since we're dealing with external API response structures we don't control.
Fixes #2419
Problem
The preflight LLM check fails for thinking models like NVIDIA Nemotron-3 Super 120B. When
enable_thinking: trueis set in the model config, the model puts all its output intoreasoning_contentrather thancontent. The preflight check only validatedcontent, so it always saw an empty response and aborted the evaluation.Change
Also check
reasoning_contentalongsidecontent:Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:d3fcdec-pythonRun
All tags pushed for this build
About Multi-Architecture Support
d3fcdec-python) is a multi-arch manifest supporting both amd64 and arm64d3fcdec-python-amd64) are also available if needed