I have agent skills in this repository that need testing, but the current structure doesn't accommodate that yet.
This issue should create a structure under test for agent testing, likely test/agents/skills/SKILLNAME/evals.json and potentially a similar structure for rules, with an accompany eval run framework.
Open Questions
- Should the behavior of ./script/test be changed to encompass all tests as part of this issue? No, while this is another addition of another type of test, we can use a separate runner at this time and handle the broader test entrypoint question separately.
Success Criteria
- ./test/agents/rules/ folder exists
- ./test/agents/skills/ folder exists
- One or more eval runner scripts exist in test/agents/, precise naming and
behavior to be determined
- Rules evals may be a placeholder at this time, as running "without rules"
evals is less easy than one might hope due to agent context loading
behavior.
Implementation Notes
- Claude Code at least can run with an overridden
HOME to prevent usual CLAUDE.md loading. If we pass through any token env variables, or run on a Mac (keychain access), it should still be authenticated, unlike use of claude --bare.
I have agent skills in this repository that need testing, but the current structure doesn't accommodate that yet.
This issue should create a structure under test for agent testing, likely test/agents/skills/SKILLNAME/evals.json and potentially a similar structure for rules, with an accompany eval run framework.
Open Questions
Success Criteria
behavior to be determined
evals is less easy than one might hope due to agent context loading
behavior.
Implementation Notes
HOMEto prevent usual CLAUDE.md loading. If we pass through any token env variables, or run on a Mac (keychain access), it should still be authenticated, unlike use ofclaude --bare.