Skill Auditor

Audit Claude Code skills for duplicates and similarity.

Features

Skill Discovery: Automatically finds all SKILL.md files in Claude Code plugin marketplaces and user skills directories
Metadata Parsing: Extracts frontmatter metadata (name, description, triggers) from skill files
Embedding Generation: Uses sentence-transformers to create semantic embeddings of skill content
Similarity Detection: Finds potentially duplicate skills using cosine similarity
LLM Evaluation: Uses Ollama to evaluate candidate duplicates for functional equivalence
Markdown Reports: Generates human-readable reports grouped by purpose

Requirements

Python 3.14+
Ollama running locally with a model available

Installation

# Clone the repository
git clone https://github.com/katmandoo212/skill_auditor.git
cd skill_auditor

# Install dependencies with uv
uv sync

# Or with pip
pip install -e .

Usage

# Scan default plugin cache
skill-auditor

# Scan custom paths
skill-auditor -p /path/to/plugins -p /another/path

# Specify output file
skill-auditor -o my_report.md

# Adjust similarity threshold (default: 0.8)
skill-auditor -t 0.75

# Use a different Ollama model
skill-auditor -m llama3.2

# Verbose output
skill-auditor -v

Options

Option	Short	Default	Description
`--path`	`-p`	`~/.claude/plugins/marketplaces`, `~/.claude/skills`	Paths to scan for skills (can be specified multiple times)
`--output`	`-o`	`skill_audit_report.md`	Output markdown file path
`--threshold`	`-t`	`0.8`	Embedding similarity threshold (0.0-1.0)
`--model`	`-m`	`glm-5:cloud`	Ollama model for LLM evaluation
`--max-candidates`		`10`	Maximum candidates per skill to send to LLM
`--verbose`	`-v`	`False`	Enable verbose logging
`--version`			Show version and exit

Output

The tool generates a markdown report with:

Summary

Total skills scanned
Number of duplicate groups found
Skills with potential duplicates
Similarity threshold and model used

Duplicate Groups

Each group contains:

Purpose tag (e.g., "code-review", "brainstorming")
Confidence level (high/medium/low)
Notes explaining the similarity
Table of duplicate skills with plugin and path

All Scanned Skills

A complete table of all discovered skills with their plugins and descriptions.

How It Works

Discovery: Scans directories for skills/*/SKILL.md pattern
Parsing: Extracts YAML frontmatter and content from each skill file
Embedding: Generates semantic embeddings using all-MiniLM-L6-v2
Similarity: Computes cosine similarity between all skill pairs
Candidate Selection: Skills with similarity >= threshold become candidates
LLM Evaluation: Sends skill + candidates to Ollama for functional analysis
Report Generation: Creates markdown report grouped by purpose

Error Handling

The tool includes robust error handling for:

File Read Errors: Handles missing files, permission errors, and encoding issues
YAML Parsing: Logs warnings for malformed frontmatter, continues with empty metadata
LLM Connection: Retries failed Ollama requests with exponential backoff (3 attempts)
JSON Parsing: Multi-pass extraction from LLM responses with brace-matching
Thread Safety: Thread-safe model initialization using double-checked locking

Development

# Install dev dependencies
uv sync

# Run tests
uv run pytest

# Run tests with verbose output
uv run pytest -v

# Run specific test file
uv run pytest tests/test_integration.py

Project Structure

skill_auditor/
├── __init__.py      # CLI entry point and main orchestration
├── __main__.py      # Package runner (python -m skill_auditor)
├── config.py        # Configuration defaults and constants
├── models.py        # Data models (Skill, DuplicateGroup, SimilarityCandidate)
├── scanner.py       # Skill discovery and parsing with error handling
├── embeddings.py    # Thread-safe embedding generation with DI support
├── evaluator.py     # LLM evaluation with retry logic and JSON parsing
└── reporter.py      # Markdown report generation

tests/
├── test_cli.py         # CLI tests
├── test_config.py      # Configuration tests
├── test_embeddings.py  # Embedding and similarity tests
├── test_evaluator.py   # LLM evaluation and JSON parsing tests
├── test_integration.py # End-to-end integration tests
├── test_models.py      # Data model tests
├── test_reporter.py    # Report generation tests
└── test_scanner.py     # Discovery and parsing tests

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs/plans		docs/plans
skill_auditor		skill_auditor
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skill Auditor

Features

Requirements

Installation

Usage

Options

Output

Summary

Duplicate Groups

All Scanned Skills

How It Works

Error Handling

Development

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Skill Auditor

Features

Requirements

Installation

Usage

Options

Output

Summary

Duplicate Groups

All Scanned Skills

How It Works

Error Handling

Development

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages