The skill that builds skills. Write a draft, run evals against a baseline, review results in an interactive viewer, improve, and repeat — until your skill actually works. Based on Anthropic's official skill-creator plugin, with bug fixes and best practices baked in.
Official skill-creator |
skill-creator-plus |
|
|---|---|---|
| Best practices guide | — | 620+ line Anthropic patterns reference |
| Script vs. Instruct guidance | — | Decision framework for when to bundle scripts vs. use instructions |
| Cross-host portability | — | Validates full agentskills.io spec + Claude-specific fields |
| Claude Code runtime docs | — | v2.1.116 listing budget, collapse mode, truncation thresholds, live-reload behavior |
| Eval viewer in Cowork | Silent fail on submit | Copyable JSON textarea (fixed) |
| Description optimizer | Requires separate ANTHROPIC_API_KEY; drifts toward 1,024-char bloat |
Uses your existing claude session; length-aware selection + plateau early-stop |
| Benchmarking script | Silent empty results | Fixed directory handling |
| Skill type taxonomy | 3 broad categories | 3 + 9 Anthropic internal types |
| Pre-packaging checklist | — | Official checklist built in |
Anthropic ships a skill-creator plugin. It's good, but several parts are broken or missing:
- Best practices guide included — 620+ lines of patterns, structural templates, troubleshooting guide, and checklists extracted from Anthropic's Complete Guide to Building Skills for Claude and Thariq's Lessons from Building Claude Code Skills. Includes a "Script vs. Instruct" decision framework for when to bundle pre-made scripts vs. keep logic as instructions — covering context window efficiency, reliability, and auditability. The built-in doesn't ship any of this.
- Cross-host portability — skills produced here validate against the full agentskills.io cross-host spec so they run on Claude, Gemini CLI, Cursor, OpenCode, and other hosts. Claude-specific fields (
when_to_use,model,paths,hooks, etc.) are supported but flagged as optional extensions, never load-bearing. - Claude Code runtime docs — documents v2.1.116 mechanics most skill authors hit the hard way: the ~8 KB total listing budget, the ~20-char collapse threshold that silently breaks every skill when any one gets too long, per-entry 1,536-char truncation, gitignore-syntax
paths:matching (the docs say "glob" — they're wrong), and chokidar depth-2 live-reload. - Eval viewer actually works in Cowork — the built-in silently fails: you write feedback, click "Submit All Reviews", it says "saved" — but nothing reaches Claude. This version reliably shows copyable JSON you can paste back.
- Description optimizer doesn't bloat or crash — the built-in calls the Anthropic SDK directly, requiring a separate
ANTHROPIC_API_KEYmost users don't have, and drifts toward 1,024-char descriptions that trigger the listing-collapse failure mode. This version usesclaude -p(just works with your existing session), plus length-aware tie-break-shortest selection, plateau early-stop, and a tunable target length so descriptions stay tight. - Benchmarking script fixed — the built-in's aggregation script silently produces empty results due to undocumented directory structure requirements. Fixed and tested.
See the CHANGELOG for the full list of fixes.
Install (Claude Code):
claude plugin marketplace add https://github.com/yaniv-golan/skill-creator-plus
claude plugin install skill-creator-plus@skill-creator-plus-marketplaceThen just ask:
/skill-creator-plus Create a skill that reviews pull requests for security issues
The skill takes it from there — intent capture, drafting, test cases, evaluation, and iteration.
Note: If you also have Anthropic's built-in
skill-creatorinstalled, Claude may pick that one instead. Either uninstall the built-in, or use/skill-creator-plusto invoke this version explicitly.
Captures your intent through structured questions, researches existing patterns, then writes a SKILL.md with metadata, instructions, and test cases.
Spawns parallel runs (with-skill and baseline) on test prompts. While runs execute, drafts quantitative assertions. Grades results via the grader agent and shows them in an interactive browser-based viewer.
Analyzes evaluation results, identifies weaknesses, and rewrites the skill. Each iteration is benchmarked against the previous version.
For rigorous validation, runs blind A/B comparison: the comparator agent scores two outputs without knowing which skill produced them, then the analyzer agent unblinds and explains the differences.
Generates trigger/non-trigger test queries, runs an optimization loop with train/test split, and selects the best-performing description.
— or install manually —
- Click Customize in the sidebar
- Click Browse Plugins
- Go to the Personal tab and click +
- Choose Add marketplace
- Type
yaniv-golan/skill-creator-plusand click Sync
From your terminal:
claude plugin marketplace add https://github.com/yaniv-golan/skill-creator-plus
claude plugin install skill-creator-plus@skill-creator-plus-marketplaceOr from within a Claude Code session:
/plugin marketplace add yaniv-golan/skill-creator-plus
/plugin install skill-creator-plus@skill-creator-plus-marketplace
- Download
skill-creator-plus.zip - Click Customize in the sidebar
- Go to Skills and click +
- Choose Upload a skill and upload the zip file
/skill-creator-plus Create a skill that reviews pull requests for security issues
/skill-creator-plus Run evals on my skill and show me the results
/skill-creator-plus Optimize my skill's description for better triggering
/skill-creator-plus Do a blind A/B comparison between the old and new version of my skill
If you built a skill using Skill Creator Plus, add this badge to your README:
[](https://github.com/yaniv-golan/skill-creator-plus)MIT — see LICENSE. Built on Anthropic's skill-creator (Apache 2.0) — see NOTICE.
