-
Notifications
You must be signed in to change notification settings - Fork 55
Surface remote eval errors instead of silently failing #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface remote eval errors instead of silently failing #322
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| ids = data.get("task_version_ids") | ||
| if isinstance(ids, list) and all(isinstance(x, str) for x in ids): | ||
| return ids | ||
| raise ValueError(f"Job registration failed: unexpected response: {data}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raises on valid responses missing task_version_ids
High Severity
_send_job_enter now unconditionally raises ValueError when the API response doesn't contain task_version_ids as a list of strings (line 114). Previously it returned None in that case, which callers handle gracefully. When called from run_eval's parallel path without a taskset, the server likely returns a success response without task_version_ids, and the new code will crash with "Job registration failed: unexpected response" even though registration succeeded. This breaks all parallel evals that don't use tasksets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore


Note
Medium Risk
Behavior changes from best-effort to raising on remote API failures, which may break existing callers that relied on silent fallback but improves correctness and debuggability.
Overview
Remote
hud eval --remotenow treats remote submission as fail-fast: it captures thetrace_idsreturned bysubmit_rollouts()and raises an error if none are accepted, rather than always reporting success._send_job_enter()inhud/eval/manager.pywas tightened to surface backend failures by removing the broad try/except fallback and instead callingraise_for_status()and raising aValueErroron unexpected responses (still returningNoneonly when telemetry is disabled/no API key).Written by Cursor Bugbot for commit 9980d62. This will update automatically on new commits. Configure here.