Surface remote eval errors instead of silently failing #322

shfunc · 2026-02-09T20:00:07Z

Note

Medium Risk
Behavior changes from best-effort to raising on remote API failures, which may break existing callers that relied on silent fallback but improves correctness and debuggability.

Overview
Remote hud eval --remote now treats remote submission as fail-fast: it captures the trace_ids returned by submit_rollouts() and raises an error if none are accepted, rather than always reporting success.

_send_job_enter() in hud/eval/manager.py was tightened to surface backend failures by removing the broad try/except fallback and instead calling raise_for_status() and raising a ValueError on unexpected responses (still returning None only when telemetry is disabled/no API key).

^{Written by Cursor Bugbot for commit 9980d62. This will update automatically on new commits. Configure here.}

hud/eval/manager.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-10T23:34:25Z

hud/eval/manager.py

+        ids = data.get("task_version_ids")
+        if isinstance(ids, list) and all(isinstance(x, str) for x in ids):
+            return ids
+    raise ValueError(f"Job registration failed: unexpected response: {data}")


Raises on valid responses missing task_version_ids

High Severity

_send_job_enter now unconditionally raises ValueError when the API response doesn't contain task_version_ids as a list of strings (line 114). Previously it returned None in that case, which callers handle gracefully. When called from run_eval's parallel path without a taskset, the server likely returns a success response without task_version_ids, and the new code will crash with "Job registration failed: unexpected response" even though registration succeeded. This breaks all parallel evals that don't use tasksets.

surface remote eval errors instead of silently swallowing

b5fcba5

cursor bot reviewed Feb 9, 2026

View reviewed changes

hud/eval/manager.py Show resolved Hide resolved

strict fix

112afff

cursor bot reviewed Feb 9, 2026

View reviewed changes

hud/eval/manager.py Outdated Show resolved Hide resolved

shfunc added 2 commits February 9, 2026 21:15

fix

bd7f86d

ruff fix

2f66d89

shfunc assigned lorenss-m Feb 10, 2026

lorenss-m requested a review from jdchawla29 February 10, 2026 18:51

always raise, strict branching not needed

9980d62

jdchawla29 approved these changes Feb 10, 2026

View reviewed changes

cursor bot reviewed Feb 10, 2026

View reviewed changes

jdchawla29 merged commit c983ca8 into hud-evals:main Feb 10, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface remote eval errors instead of silently failing #322

Surface remote eval errors instead of silently failing #322

Uh oh!

shfunc commented Feb 9, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 10, 2026

Uh oh!

jdchawla29 Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Surface remote eval errors instead of silently failing #322

Surface remote eval errors instead of silently failing #322

Uh oh!

Conversation

shfunc commented Feb 9, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 10, 2026

Choose a reason for hiding this comment

Raises on valid responses missing task_version_ids

Uh oh!

jdchawla29 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shfunc commented Feb 9, 2026 •

edited by cursor bot

Loading

Raises on valid responses missing `task_version_ids`