Skip to content

Preflight check fails for thinking models (e.g. Nemotron) due to empty content field #2419

@juanmichelini

Description

@juanmichelini

Problem

The preflight LLM check in .github/run-eval/resolve_model_config.py fails for thinking models like NVIDIA Nemotron-3 Super 120B.

When enable_thinking: true is set in the model config, the model puts all its output into reasoning_content rather than content. The current preflight check only validates content, so it always sees an empty response and aborts the evaluation.

✗ NVIDIA Nemotron-3 Super 120B: Empty response (finish_reason=length, usage=Usage(completion_tokens=100, ...))

Fix

Also check reasoning_content alongside content:

response_content = response.choices[0].message.content if response.choices else None
reasoning_content = response.choices[0].message.reasoning_content if response.choices else None

if response_content or reasoning_content:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions