Skip to content

fix: preflight check now validates reasoning_content for thinking models#2420

Merged
xingyaoww merged 4 commits intomainfrom
fix/preflight-thinking-models
Mar 17, 2026
Merged

fix: preflight check now validates reasoning_content for thinking models#2420
xingyaoww merged 4 commits intomainfrom
fix/preflight-thinking-models

Conversation

@juanmichelini
Copy link
Collaborator

@juanmichelini juanmichelini commented Mar 13, 2026

Fixes #2419

Problem

The preflight LLM check fails for thinking models like NVIDIA Nemotron-3 Super 120B. When enable_thinking: true is set in the model config, the model puts all its output into reasoning_content rather than content. The preflight check only validated content, so it always saw an empty response and aborted the evaluation.

Change

Also check reasoning_content alongside content:

response_content = response.choices[0].message.content if response.choices else None
reasoning_content = response.choices[0].message.reasoning_content if response.choices else None

if response_content or reasoning_content:

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d3fcdec-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d3fcdec-python \
  ghcr.io/openhands/agent-server:d3fcdec-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d3fcdec-golang-amd64
ghcr.io/openhands/agent-server:d3fcdec-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d3fcdec-golang-arm64
ghcr.io/openhands/agent-server:d3fcdec-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d3fcdec-java-amd64
ghcr.io/openhands/agent-server:d3fcdec-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d3fcdec-java-arm64
ghcr.io/openhands/agent-server:d3fcdec-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d3fcdec-python-amd64
ghcr.io/openhands/agent-server:d3fcdec-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:d3fcdec-python-arm64
ghcr.io/openhands/agent-server:d3fcdec-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:d3fcdec-golang
ghcr.io/openhands/agent-server:d3fcdec-java
ghcr.io/openhands/agent-server:d3fcdec-python

About Multi-Architecture Support

  • Each variant tag (e.g., d3fcdec-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d3fcdec-python-amd64) are also available if needed

Fixes #2419

Models with enable_thinking=true (e.g. Nemotron) put output into
reasoning_content rather than content. The preflight check now checks
both fields so thinking models pass correctly.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 13, 2026

API breakage checks (Griffe)

Result: Passed

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 13, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Failed

Log excerpt (first 1000 characters)
::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.14.0 -> 1.14.0).

Breaking REST API changes detected compared to baseline release:
- added '#/components/schemas/ACPAgent-Output, #/components/schemas/Agent-Output' to the '/items/anyOf[subschema #1: ConversationInfo]/agent' response property 'oneOf' list for the response status '200'
- the '/items/anyOf[subschema #1: ConversationInfo]/agent' response's property type/format changed from 'object'/'' to ''/'' for status '200'
- removed the required property '/items/anyOf[subschema #1: ConversationInfo]/agent/kind' from the response with the '200' status
- removed the required property '/items/anyOf[subschema #1: ConversationInfo]/agent/llm' from the response with the '200' status
- the 'agent' request property type/format changed from 'object'/'' to ''/''
- added '#/components/schemas/ACPAgent-Output, #/components/schemas/Agent-Output' to the 'agent' response property 'oneOf' li

Action log

@juanmichelini
Copy link
Collaborator Author

@OpenHands This worked with NVIDIA Nemotron-3 Super 120B

but when testing with Claude Sonnet 4.5 we got:

✗ Claude Sonnet 4.5: AttributeError - 'Message' object has no attribute 'reasoning_content'

Please fix it so it works with models that don't have reasoning_content too.

@openhands-ai
Copy link

openhands-ai bot commented Mar 13, 2026

I'm on it! juanmichelini can track my progress at all-hands.dev

…models

Models like Claude Sonnet 4.5 don't have reasoning_content on their
Message object, causing AttributeError. Use getattr with a default of
None to safely handle both thinking and non-thinking model responses.

Also fix test_empty_response (which was passing incorrectly since
MagicMock auto-generates truthy attributes) and add two new tests:
- test_thinking_model_success: reasoning_content only, no content
- test_model_without_reasoning_content_attribute: no attr at all

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator Author

Fixed in bf48dd9.

The issue was that the original fix used direct attribute access () which throws AttributeError on model response objects that don't have that attribute (e.g. Claude Sonnet 4.5).

Changed to getattr(message, "reasoning_content", None) which safely returns None when the attribute is absent, so the check works correctly for:

  • Standard models (Claude, etc.): reasoning_content absent → falls back to content
  • Thinking models (Nemotron, etc.): reasoning_content present → passes preflight
  • Empty responses: both None → correctly reports failure

Also fixed the test_empty_response test (which was silently passing incorrectly because MagicMock auto-generates truthy attributes for any access), and added two new targeted tests.

@openhands-ai
Copy link

openhands-ai bot commented Mar 13, 2026

Summary of Changes

Problem addressed: PR #2420's fix for thinking models broke non-thinking models (like Claude Sonnet 4.5) with AttributeError: 'Message' object has no attribute 'reasoning_content'.

Changes made (commit bf48dd95):

.github/run-eval/resolve_model_config.py

  • Replaced direct attribute access .reasoning_content with getattr(message, "reasoning_content", None) so models whose response objects lack that attribute (Claude, etc.) return None gracefully instead of raising AttributeError.

tests/github_workflows/test_resolve_model_config.py

  • Fixed test_empty_response — it was silently broken: MagicMock auto-generates truthy values for any attribute access, so .reasoning_content on the mock returned a truthy MagicMock instead of falsy. Added reasoning_content=None explicitly to the mock.
  • Added test_thinking_model_success — verifies that a model response with only reasoning_content (empty content) still passes the preflight check.
  • Added test_model_without_reasoning_content_attribute — uses SimpleNamespace (a real object with no reasoning_content attribute) to confirm the fix handles models like Claude without raising AttributeError.

All 8 tests in TestTestModel pass, pre-commit hooks are clean, and the changes have been pushed to fix/preflight-thinking-models.

@juanmichelini
Copy link
Collaborator Author

good this works, tested for glm-4.7, claude-sonnet-4-5-20250929, qwen3.5-flash, claude-sonnet-4-6, nemotron-3-super-120b-a12b

here https://github.com/OpenHands/software-agent-sdk/actions/runs/23074729849

@juanmichelini juanmichelini marked this pull request as ready for review March 16, 2026 16:27
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean fix for a real problem. The getattr fallback is appropriate here since we're dealing with external API response structures we don't control.

@xingyaoww xingyaoww merged commit c823cde into main Mar 17, 2026
57 of 60 checks passed
@xingyaoww xingyaoww deleted the fix/preflight-thinking-models branch March 17, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preflight check fails for thinking models (e.g. Nemotron) due to empty content field

4 participants