Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ npm run validate # Fix all errors before committing. Warnings are acceptable
```
Validate uses [skill-validator](https://github.com/agent-ecosystem/skill-validator) for structure, links, content analysis, and contamination checks. It runs in CI and blocks deployment on errors.

When adding new behaviors, commands, or pitfalls to a skill, also consider whether the `evaluations/<skill-name>.json` file needs new eval cases to cover them. New pitfalls and non-obvious behaviors are strong candidates for evals — especially adversarial ones where an agent would likely get it wrong without the skill.

**PR eval requirements:**
- **New skill:** run the full suite (`node scripts/evaluate-skills.js <skill-name>`) and include both output eval and trigger eval results in the PR description. PRs without eval results are not accepted.
- **Skill improvement with new evals:** run only the new eval cases and include both with-skill and baseline results.
- Always wrap eval output in a collapsed `<details>` block in the PR description.

## LLM Quality Scoring

Before submitting a PR, run LLM scoring locally to check skill quality:
Expand Down
23 changes: 20 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ if the user is already authenticated. Keep it minimal — no backend code,
no icp.yaml, no deploy steps."
```

**Running evaluations** (optional, requires `claude` CLI):
**Running evaluations** (requires `claude` CLI):

```bash
node scripts/evaluate-skills.js <skill-name> # All evals, with + without skill
Expand All @@ -181,7 +181,7 @@ node scripts/evaluate-skills.js <skill-name> --triggers-only # Trigger evals

This sends each prompt to Claude with and without the skill, then has a judge score the output. Results are saved to `evaluations/results/` (gitignored).

Including a summary of eval results in your PR description is recommended but not required — running evals needs `claude` CLI access and costs API credits.
**Eval results are required in the PR for new skills** — see [Step 7](#7-submit-a-pr) for the required format.

### 6. That's it — the website auto-discovers skills

Expand All @@ -195,6 +195,20 @@ Stats (skill count, categories) all update automatically.
- Include a brief description of what the skill covers and why it's needed
- Include LLM scoring output in your PR description if you ran it locally (see step 4)
- Make sure the SKILL.md is tested — code examples should compile and deploy
- **Eval results are required.** Run the full evaluation suite locally and paste the results into the PR description. Both output evals and trigger evals must be included. PRs without eval results will not be accepted.
- **Collapse the results** using a `<details>` block to keep the PR description readable:

````markdown
<details>
<summary>Evaluation results</summary>

```
[paste eval output here]
```

</details>
````

- **All PRs require approval from a repo admin before merge.** No skill additions or updates go live without review.

---
Expand All @@ -204,7 +218,10 @@ Stats (skill count, categories) all update automatically.
1. Edit the `SKILL.md` content
2. Run `npm run validate`
3. Optionally run LLM scoring (see step 4 above)
4. Submit a PR with a summary of what changed
4. If you added new evaluation cases, run those evals locally and include the results in the PR
5. Submit a PR with a summary of what changed

**Eval results for skill improvements:** If you added new eval cases, you only need to provide results for those new cases — not the full suite. Both the with-skill and baseline (without-skill) results must be included. Collapse them in the PR description using a `<details>` block (see [Submit a PR](#7-submit-a-pr) above).

The website auto-generates from SKILL.md frontmatter — no need to edit any source files.

Expand Down
Loading