Skip to content

Integrate agent skill evals into testing #107

@kergoth

Description

@kergoth

I have agent skills in this repository that need testing, but the current structure doesn't accommodate that yet.

This issue should create a structure under test for agent testing, likely test/agents/skills/SKILLNAME/evals.json and potentially a similar structure for rules, with an accompany eval run framework.

Open Questions

  • Should the behavior of ./script/test be changed to encompass all tests as part of this issue? No, while this is another addition of another type of test, we can use a separate runner at this time and handle the broader test entrypoint question separately.

Success Criteria

  • ./test/agents/rules/ folder exists
  • ./test/agents/skills/ folder exists
  • One or more eval runner scripts exist in test/agents/, precise naming and
    behavior to be determined
  • Rules evals may be a placeholder at this time, as running "without rules"
    evals is less easy than one might hope due to agent context loading
    behavior.

Implementation Notes

  • Claude Code at least can run with an overridden HOME to prevent usual CLAUDE.md loading. If we pass through any token env variables, or run on a Mac (keychain access), it should still be authenticated, unlike use of claude --bare.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions