Skip to content

Conversation

@MagellaX
Copy link
Contributor

@MagellaX MagellaX commented Jan 26, 2026

Summary

  • add hud validate to check task files or HF datasets without running them
  • surface per‑task validation errors with clear CLI output
  • include a small unit test for valid/invalid task files

Note

Low Risk
Adds a new CLI command and validation-only code paths; main risk is false positives/negatives in task parsing/validation rather than runtime behavior changes.

Overview
Adds a new hud validate command that loads tasks from a local .json/.jsonl file or a dataset slug and validates each entry without running an eval.

Validation now checks v4-style tasks via validate_v4_task (when detected) and always attempts Pydantic Task construction, aggregating and printing per-task errors before exiting non-zero on failure. Includes unit tests covering valid tasks, missing required fields, and non-dict entries in the tasks list.

Written by Cursor Bugbot for commit 0191b4d. This will update automatically on new commits. Configure here.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b4f45c79d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@MagellaX
Copy link
Contributor Author

any thoughts? @lorenss-m

task_list = [t if isinstance(t, Task) else Task.from_v4(t) for t in tasks]

if not task_list:
raise ValueError("No tasks to run")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated task normalization logic in runner functions

Medium Severity

The task normalization logic in run_dataset_async (lines 159-175) is nearly identical to run_dataset (lines 76-94). Both functions normalize agent_type from string to AgentType enum and normalize tasks from various input types to list[Task]. This ~17 lines of duplicated code should be extracted into a shared helper function like _normalize_tasks().

Fix in Cursor Fix in Web

@MagellaX MagellaX force-pushed the feature/task-validate-cli branch from 579b4f0 to 0191b4d Compare February 2, 2026 07:41
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.


def _load_raw_tasks(source: str) -> tuple[list[dict[str, Any]], list[str]]:
path = Path(source)
if path.exists() and path.suffix.lower() in {".json", ".jsonl"}:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case sensitivity mismatch between validation and loading

Low Severity

The _load_raw_tasks and _load_raw_from_file functions use case-insensitive extension matching via .suffix.lower(), while the existing load_tasks function in loader.py uses case-sensitive matching. This means a file like tasks.JSONL would pass validation but fail when actually loaded via load_tasks, because loader.py wouldn't recognize the uppercase extension and would incorrectly try to fetch it as a HuggingFace dataset.

Additional Locations (1)

Fix in Cursor Fix in Web

module = importlib.util.module_from_spec(spec) # type: ignore[arg-type]
assert spec and spec.loader
spec.loader.exec_module(module)
return module.validate_command
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test uses unnecessarily complex importlib module loading

Medium Severity

The _load_validate_command() function uses importlib.util.spec_from_file_location to manually load the module when a simple import would work: from hud.cli.validate import validate_command. This pattern is inconsistent with other tests in hud/cli/tests/ which use standard imports.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant