fix: preflight check now validates reasoning_content for thinking models by juanmichelini · Pull Request #2420 · OpenHands/software-agent-sdk

juanmichelini · 2026-03-13T16:52:42Z

Problem

The preflight LLM check fails for thinking models like NVIDIA Nemotron-3 Super 120B. When enable_thinking: true is set in the model config, the model puts all its output into reasoning_content rather than content. The preflight check only validated content, so it always saw an empty response and aborted the evaluation.

Change

Also check reasoning_content alongside content:

response_content = response.choices[0].message.content if response.choices else None
reasoning_content = response.choices[0].message.reasoning_content if response.choices else None

if response_content or reasoning_content:

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d3fcdec-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d3fcdec-python \
  ghcr.io/openhands/agent-server:d3fcdec-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d3fcdec-golang-amd64
ghcr.io/openhands/agent-server:d3fcdec-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d3fcdec-golang-arm64
ghcr.io/openhands/agent-server:d3fcdec-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d3fcdec-java-amd64
ghcr.io/openhands/agent-server:d3fcdec-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d3fcdec-java-arm64
ghcr.io/openhands/agent-server:d3fcdec-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d3fcdec-python-amd64
ghcr.io/openhands/agent-server:d3fcdec-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:d3fcdec-python-arm64
ghcr.io/openhands/agent-server:d3fcdec-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:d3fcdec-golang
ghcr.io/openhands/agent-server:d3fcdec-java
ghcr.io/openhands/agent-server:d3fcdec-python

About Multi-Architecture Support

Each variant tag (e.g., d3fcdec-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., d3fcdec-python-amd64) are also available if needed

Fixes #2419 Models with enable_thinking=true (e.g. Nemotron) put output into reasoning_content rather than content. The preflight check now checks both fields so thinking models pass correctly. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-13T16:53:12Z

API breakage checks (Griffe)

Result: Passed

Action log

github-actions · 2026-03-13T16:53:17Z

Agent server REST API breakage checks (OpenAPI)

Result: Failed

Log excerpt (first 1000 characters)

::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.14.0 -> 1.14.0).

Breaking REST API changes detected compared to baseline release:
- added '#/components/schemas/ACPAgent-Output, #/components/schemas/Agent-Output' to the '/items/anyOf[subschema #1: ConversationInfo]/agent' response property 'oneOf' list for the response status '200'
- the '/items/anyOf[subschema #1: ConversationInfo]/agent' response's property type/format changed from 'object'/'' to ''/'' for status '200'
- removed the required property '/items/anyOf[subschema #1: ConversationInfo]/agent/kind' from the response with the '200' status
- removed the required property '/items/anyOf[subschema #1: ConversationInfo]/agent/llm' from the response with the '200' status
- the 'agent' request property type/format changed from 'object'/'' to ''/''
- added '#/components/schemas/ACPAgent-Output, #/components/schemas/Agent-Output' to the 'agent' response property 'oneOf' li

Action log

juanmichelini · 2026-03-13T23:24:27Z

@OpenHands This worked with NVIDIA Nemotron-3 Super 120B

but when testing with Claude Sonnet 4.5 we got:

✗ Claude Sonnet 4.5: AttributeError - 'Message' object has no attribute 'reasoning_content'

Please fix it so it works with models that don't have reasoning_content too.

openhands-ai · 2026-03-13T23:24:42Z

I'm on it! juanmichelini can track my progress at all-hands.dev

…models Models like Claude Sonnet 4.5 don't have reasoning_content on their Message object, causing AttributeError. Use getattr with a default of None to safely handle both thinking and non-thinking model responses. Also fix test_empty_response (which was passing incorrectly since MagicMock auto-generates truthy attributes) and add two new tests: - test_thinking_model_success: reasoning_content only, no content - test_model_without_reasoning_content_attribute: no attr at all Co-authored-by: openhands <openhands@all-hands.dev>

juanmichelini · 2026-03-13T23:28:01Z

Fixed in bf48dd9.

The issue was that the original fix used direct attribute access () which throws AttributeError on model response objects that don't have that attribute (e.g. Claude Sonnet 4.5).

Changed to getattr(message, "reasoning_content", None) which safely returns None when the attribute is absent, so the check works correctly for:

Standard models (Claude, etc.): reasoning_content absent → falls back to content
Thinking models (Nemotron, etc.): reasoning_content present → passes preflight
Empty responses: both None → correctly reports failure

Also fixed the test_empty_response test (which was silently passing incorrectly because MagicMock auto-generates truthy attributes for any access), and added two new targeted tests.

openhands-ai · 2026-03-13T23:28:53Z

Summary of Changes

Problem addressed: PR #2420's fix for thinking models broke non-thinking models (like Claude Sonnet 4.5) with AttributeError: 'Message' object has no attribute 'reasoning_content'.

Changes made (commit `bf48dd95`):

.github/run-eval/resolve_model_config.py

Replaced direct attribute access .reasoning_content with getattr(message, "reasoning_content", None) so models whose response objects lack that attribute (Claude, etc.) return None gracefully instead of raising AttributeError.

tests/github_workflows/test_resolve_model_config.py

Fixed test_empty_response — it was silently broken: MagicMock auto-generates truthy values for any attribute access, so .reasoning_content on the mock returned a truthy MagicMock instead of falsy. Added reasoning_content=None explicitly to the mock.
Added test_thinking_model_success — verifies that a model response with only reasoning_content (empty content) still passes the preflight check.
Added test_model_without_reasoning_content_attribute — uses SimpleNamespace (a real object with no reasoning_content attribute) to confirm the fix handles models like Claude without raising AttributeError.

All 8 tests in TestTestModel pass, pre-commit hooks are clean, and the changes have been pushed to fix/preflight-thinking-models.

juanmichelini · 2026-03-13T23:38:01Z

good this works, tested for glm-4.7, claude-sonnet-4-5-20250929, qwen3.5-flash, claude-sonnet-4-6, nemotron-3-super-120b-a12b

here https://github.com/OpenHands/software-agent-sdk/actions/runs/23074729849

all-hands-bot

🟢 Good taste - Clean fix for a real problem. The getattr fallback is appropriate here since we're dealing with external API response structures we don't control.

juanmichelini added 2 commits March 13, 2026 18:15

Bumping max_tokens for nemotron 3 super

6272e06

Update resolve_model_config.py

552a384

juanmichelini marked this pull request as ready for review March 16, 2026 16:27

all-hands-bot approved these changes Mar 16, 2026

View reviewed changes

xingyaoww approved these changes Mar 17, 2026

View reviewed changes

xingyaoww merged commit c823cde into main Mar 17, 2026
57 of 60 checks passed

xingyaoww deleted the fix/preflight-thinking-models branch March 17, 2026 14:31

juanmichelini mentioned this pull request Mar 17, 2026

[BUG] Remote conversation got stuck: monologue detector false positive on extended thinking models #2482

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preflight check now validates reasoning_content for thinking models#2420

fix: preflight check now validates reasoning_content for thinking models#2420
xingyaoww merged 4 commits intomainfrom
fix/preflight-thinking-models

juanmichelini commented Mar 13, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

juanmichelini commented Mar 13, 2026

Uh oh!

openhands-ai bot commented Mar 13, 2026

Uh oh!

juanmichelini commented Mar 13, 2026

Uh oh!

openhands-ai bot commented Mar 13, 2026

Uh oh!

juanmichelini commented Mar 13, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

juanmichelini commented Mar 13, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change

Uh oh!

github-actions bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

juanmichelini commented Mar 13, 2026

Uh oh!

openhands-ai bot commented Mar 13, 2026

Uh oh!

juanmichelini commented Mar 13, 2026

Uh oh!

openhands-ai bot commented Mar 13, 2026

Summary of Changes

Changes made (commit bf48dd95):

Uh oh!

juanmichelini commented Mar 13, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

juanmichelini commented Mar 13, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Mar 13, 2026 •

edited

Loading

github-actions bot commented Mar 13, 2026 •

edited

Loading

Changes made (commit `bf48dd95`):