Skip to content

Conversation

@shfunc
Copy link
Contributor

@shfunc shfunc commented Feb 9, 2026

Note

Medium Risk
Behavior changes from best-effort to raising on remote API failures, which may break existing callers that relied on silent fallback but improves correctness and debuggability.

Overview
Remote hud eval --remote now treats remote submission as fail-fast: it captures the trace_ids returned by submit_rollouts() and raises an error if none are accepted, rather than always reporting success.

_send_job_enter() in hud/eval/manager.py was tightened to surface backend failures by removing the broad try/except fallback and instead calling raise_for_status() and raising a ValueError on unexpected responses (still returning None only when telemetry is disabled/no API key).

Written by Cursor Bugbot for commit 9980d62. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

ids = data.get("task_version_ids")
if isinstance(ids, list) and all(isinstance(x, str) for x in ids):
return ids
raise ValueError(f"Job registration failed: unexpected response: {data}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raises on valid responses missing task_version_ids

High Severity

_send_job_enter now unconditionally raises ValueError when the API response doesn't contain task_version_ids as a list of strings (line 114). Previously it returned None in that case, which callers handle gracefully. When called from run_eval's parallel path without a taskset, the server likely returns a success response without task_version_ids, and the new code will crash with "Job registration failed: unexpected response" even though registration succeeded. This breaks all parallel evals that don't use tasksets.

Fix in Cursor Fix in Web

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore

@jdchawla29 jdchawla29 merged commit c983ca8 into hud-evals:main Feb 10, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants