feat(skills): auto-activate skills by fuzzy-matching user prompt by Prithvi1994 · Pull Request #253 · mpfaffenberger/code_puppy

Prithvi1994 · 2026-04-01T02:48:00Z

Problem

Skills in Code Puppy are injected into the system prompt as an XML list of names + descriptions, with guidance text telling the LLM to call activate_skill() when a match is detected.

In practice, skills rarely activate unless the user explicitly says "skill" in their prompt. The LLM doesn't proactively match task intent against skill descriptions — it needs a very direct signal.

This is a real usability gap: a user asking "help me create a pull request" won't benefit from an installed github-pr-workflow skill, even though it's a perfect match.

Root Cause

Agents are activated deterministically — invoke_agent is an explicit tool call. Skills rely entirely on LLM discretion — passive XML in the prompt with a vague instruction to "match". There's no pre-run matching layer.

Solution

This PR adds a new plugin auto_skill_activator that hooks into get_model_system_prompt and:

Scores every installed skill's name + description + tags against the user's prompt using rapidfuzz.token_set_ratio (already a project dependency)
Skills scoring ≥ 65 (configurable via AUTO_ACTIVATE_THRESHOLD) have their full SKILL.md content auto-injected into the system prompt before the agent runs
Caps at 3 skills max (MAX_AUTO_ACTIVATE) to protect context window
Always sets handled=False so claude-code, antigravity, and other model handlers still fire
Degrades gracefully on any error — never crashes the agent run

Why This Approach

Pure plugin — zero changes to core (base_agent.py, model_utils.py, callbacks.py)
Follows AGENTS.md — uses existing callback architecture exactly as documented
Uses existing dependency — rapidfuzz is already in pyproject.toml
Non-breaking — users with no skills installed see zero behavior change

Files Changed

File	Purpose
`code_puppy/plugins/auto_skill_activator/__init__.py`	Plugin package
`code_puppy/plugins/auto_skill_activator/register_callbacks.py`	Core logic (158 lines)
`tests/plugins/test_auto_skill_activator.py`	16 test cases covering activation, ranking, caps, error resilience

Test Coverage

✅ Matching skill is injected with full content
✅ Unrelated prompt returns None (no-op)
✅ Empty/whitespace prompt returns None
✅ Globally disabled skills respected
✅ Per-skill disabled list respected
✅ Skills without SKILL.md skipped
✅ Multiple skills ranked by score, capped at MAX_AUTO_ACTIVATE
✅ handled=False always set
✅ User prompt preserved in result
✅ Exception in discovery returns None gracefully

Summary by CodeRabbit

New Features
- Auto Skill Activator: scores discovered skills against the user prompt and auto-injects the most relevant SKILL.md content into the system prompt, honoring disabled/missing skills and a maximum-injection cap; preserves the original user prompt and degrades gracefully on errors.
Tests
- Comprehensive test suite for LLM and fuzzy scoring, injection behavior, handling of disabled/missing skills, selection caps, state reset after history compaction, and robust error handling.

Skills were only activated if the LLM proactively called activate_skill(), which rarely happened unless the user explicitly mentioned "skill". This plugin hooks into get_model_system_prompt and scores all installed skill descriptions against the user prompt using rapidfuzz token_set_ratio (already a project dependency). Skills scoring >= 65 have their full SKILL.md content auto-injected into the system prompt before the agent runs. Key design decisions: - handled=False so claude-code/antigravity handlers still fire - Capped at MAX_AUTO_ACTIVATE=3 skills to protect context window - Degrades gracefully on any error (returns None, never crashes agent) - Uses existing plugin/callback architecture (no core changes) Fixes: skills not activating unless prompt contains the word "skill"

coderabbitai · 2026-04-01T02:48:13Z

Warning

Rate limit exceeded

@Prithvi1994 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 45 minutes and 43 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 45 minutes and 43 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a608323b-fda4-4b28-a1e6-1be882172e22

📥 Commits

Reviewing files that changed from the base of the PR and between efb23c4 and a85c5cd.

📒 Files selected for processing (1)

code_puppy/plugins/auto_skill_activator/register_callbacks.py

📝 Walkthrough

Walkthrough

New Auto Skill Activator plugin that discovers installed skills, scores their relevance to the user prompt (LLM steering model with fuzzy fallback), and auto-injects qualifying SKILL.md contents into the model system prompt; includes a compaction hook to detect removed injections and comprehensive unit tests.

Changes

Cohort / File(s)	Summary
Plugin entry `code_puppy/plugins/auto_skill_activator/__init__.py`	New module placeholder for the Auto Skill Activator plugin.
Callback implementation `code_puppy/plugins/auto_skill_activator/register_callbacks.py`	Added core plugin: discovery/filtering of skills, LLM-driven scoring with steering model + JSON parsing, fuzzy fallback scoring (rapidfuzz or word-overlap), constants (`AUTO_ACTIVATE_THRESHOLD`, `MAX_AUTO_ACTIVATE`, `STEERING_MODEL_DEFAULT`), session-scoped activation state, `_auto_inject_skills` registered to `get_model_system_prompt`, and `_on_message_history_processor_end` compaction hook.
Tests `tests/plugins/test_auto_skill_activator.py`	Comprehensive unit tests covering fuzzy scoring, LLM scoring parsing and fallback, end-to-end injection flows, thresholding/ranking, MAX_AUTO_ACTIVATE behavior, disabled/missing-skill cases, compaction re-injection behavior, and error resilience.

Sequence Diagram

sequenceDiagram
    actor User
    participant PluginSystem as Plugin System
    participant Callback as Auto-Activate Callback
    participant Discovery as Skill Discovery
    participant Scorer as Scoring Engine
    participant Loader as Skill Content Loader
    participant Prompt as System Prompt

    User->>PluginSystem: request model with user prompt
    PluginSystem->>Callback: invoke get_model_system_prompt
    Callback->>Discovery: discover installed skills & metadata
    Discovery-->>Callback: skill list
    Callback->>Callback: filter disabled / missing SKILL.md
    loop each eligible skill
        Callback->>Scorer: score prompt vs skill metadata
        Scorer-->>Callback: relevance score (0-100)
    end
    Callback->>Callback: select top N by threshold & MAX_AUTO_ACTIVATE
    loop each selected skill
        Callback->>Loader: load SKILL.md content
        Loader-->>Callback: skill content
        Callback->>Prompt: append skill content with relevance %
    end
    Callback-->>PluginSystem: return modified instructions + user_prompt
    PluginSystem-->>User: model prompt ready

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

mpfaffenberger

Poem

🐰 I sniff the prompt and hop to find the best,
I count the clues and rate each little quest,
I tuck SKILL.md pages into the prompt's bright nest,
Top helpers wake to lend their clever zest,
Hooray — the agent's toolbox hops to its best!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 32.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main feature: auto-activation of skills based on user prompt matching. It aligns with the changeset's core functionality of scoring skills and injecting relevant ones into the system prompt.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/plugins/test_auto_skill_activator.py`:
- Around line 139-145: The string value passed in skill_contents for the
"github-pr" key uses a raw newline inside single quotes which causes a
SyntaxError; update the literal in the test (where _run is called and
skill_contents={"github-pr": "..."} and the similar occurrence around lines
158-159) to a valid Python multiline string by either using a triple-quoted
string ("""...""") or replacing the raw newline with an explicit "\n" escape so
the module parses correctly.
- Around line 62-66: The test test_synonym_scores_above_threshold is failing
because _score_prompt_against_skill with token_set_ratio returns 50 for "push my
docker app" vs "deploy docker container deployment" which is below
AUTO_ACTIVATE_THRESHOLD; change the test input to increase token overlap (e.g.,
use a prompt like "deploy my docker container" or "deploy docker container") so
the token_set_ratio rises above AUTO_ACTIVATE_THRESHOLD when calling
_score_prompt_against_skill.
- Around line 106-130: The patches target attributes on register_callbacks but
those functions are imported inside register_callbacks.register_callbacks(), so
patching fails; update the patch targets to the modules that actually define
get_skills_enabled, get_skill_directories, get_disabled_skills, discover_skills,
parse_skill_metadata, and load_full_skill_content (i.e., patch the original
defining modules where those functions live) in the test block that patches
these names around register_callbacks.register_callbacks(), and make the same
replacements for the second patch block at lines 283–293 so both sets of patches
point to the real defining modules instead of register_callbacks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d6a2cc72-0f3f-42c5-b4d5-32ef63b95908

📥 Commits

Reviewing files that changed from the base of the PR and between 5c7338d and 70e4adb.

📒 Files selected for processing (3)

code_puppy/plugins/auto_skill_activator/__init__.py
code_puppy/plugins/auto_skill_activator/register_callbacks.py
tests/plugins/test_auto_skill_activator.py

- Fix test_synonym_scores_above_threshold: 'push my docker app' scores 50 (below threshold 65) with token_set_ratio; use 'deploy my docker container' which shares tokens with the skill description and scores 94 - Fix SyntaxError: raw newlines in single-quoted strings replaced with \\n - Fix mock patch targets: functions are imported inside the function body so patching register_callbacks.<name> raises AttributeError; patch the defining modules directly (agent_skills.config, .discovery, .metadata)

mpfaffenberger · 2026-04-13T12:11:07Z

I like the idea, but I think it would be much more powerful to leverage a smaller model like Haiku, one of the GPT Nanos, or even an open source one like Gemma 4 / GLM Flash to choose which skill to inject - if any.

We would also need the background model to note if a compaction occurs at any given time to re-inject the skill if necessary.

Then we could further generalize this concept beyond skill injection. Background steering agents.

Prithvi1994 · 2026-04-13T16:20:16Z

Great feedback — I'm on board with the direction. Here's my plan:

Swap fuzzy matching for a small background model — Replace rapidfuzz.token_set_ratio with a lightweight LLM call (Haiku / GPT-4o-mini / Gemma 4 IT / GLM Flash) to semantically evaluate which skills to inject (if any). This gives us actual understanding of intent vs. token overlap, which is the right call.
Re-inject on compaction — Add a hook so the background steering model re-evaluates and re-injects the relevant skill content after any context compaction event, keeping the skill context alive across the full session.
Generalize to background steering agent — Once the above is solid, extend the pattern beyond skill injection into a broader "background steering" concept — a lightweight model that runs alongside the main agent to handle skill selection, context management, and other steering decisions.

I'll push an updated commit with items 1 and 2 first, then follow up with the generalization. Thanks for the direction!

…action re-injection - Replace rapidfuzz token_set_ratio with lightweight LLM call (Haiku/GPT-4o-mini/Gemma) for semantic skill relevance scoring - Add compaction re-injection hook via message_history_processor_end callback to re-evaluate and re-inject skills after context compaction - Keep rapidfuzz as fallback when steering model is unavailable - Add STEERING_MODEL_DEFAULT config and auto_skill_steering_model setting - Update tests for new LLM-based scoring approach + compaction re-injection Per mpfaffenberger feedback: mpfaffenberger#253 (comment)

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

code_puppy/plugins/auto_skill_activator/register_callbacks.py (1)
46-48: Module-level mutable state lacks thread safety for concurrent agents.

If multiple agents run concurrently in the same process, _last_activated_skills and _last_user_prompt could race. Consider using a thread-local or session-keyed dictionary if concurrent agent support is planned.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_puppy/plugins/auto_skill_activator/register_callbacks.py` around lines
46 - 48, The module-level mutable state variables _last_activated_skills and
_last_user_prompt are not thread-safe and can race when multiple agents run
concurrently; change them to a concurrency-safe mechanism (e.g., use
threading.local() or contextvars.ContextVar, or a session-keyed dict keyed by
agent/session id) within register_callbacks.py so each agent has isolated state;
locate usages of _last_activated_skills and _last_user_prompt in this module
(functions that read/write them) and replace reads/writes with the chosen
thread-local/session-scoped accessor to ensure no cross-agent contamination.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code_puppy/plugins/auto_skill_activator/register_callbacks.py`:
- Around line 309-313: The handler in register_callbacks.py is returning
"handled": False which prevents model_utils.py (see result.get("handled") at
line ~99) from accepting the PreparedPrompt; change the response so the callback
returns "handled": True (or refactor the callback chain) so the injected prompt
is used—update the return dict in register_callbacks.py that builds the
PreparedPrompt (keys "instructions", "user_prompt", "handled") to set handled to
True (or implement a proper chaining mechanism in model_utils.py to allow
multiple handlers to contribute).
- Around line 119-141: The current use of ThreadPoolExecutor with
Future.result(timeout=15) can leak threads because the underlying asyncio task
(scoring_agent.run) keeps running after the Future times out; replace this
pattern by enforcing the timeout inside the asyncio task so it gets canceled
instead of leaving a running thread: when running in the async context (the
branch where loop.is_running() is true), run scoring_agent.run(user_message)
inside a thread but wrap the coroutine with asyncio.wait_for(..., timeout=15)
(or run a helper that calls asyncio.run(asyncio.wait_for(scoring_agent.run(...),
15))) so the coroutine is canceled on timeout, and update the exception handling
to catch asyncio.TimeoutError and handle cancellation/cleanup; alternatively use
ProcessPoolExecutor with explicit termination if cancellation inside the worker
is insufficient. Ensure you modify the code paths that call scoring_agent.run
(the ThreadPoolExecutor block and the else branch if you also need timeouts
there) and keep the existing logging/timeout handling consistent.

---

Nitpick comments:
In `@code_puppy/plugins/auto_skill_activator/register_callbacks.py`:
- Around line 46-48: The module-level mutable state variables
_last_activated_skills and _last_user_prompt are not thread-safe and can race
when multiple agents run concurrently; change them to a concurrency-safe
mechanism (e.g., use threading.local() or contextvars.ContextVar, or a
session-keyed dict keyed by agent/session id) within register_callbacks.py so
each agent has isolated state; locate usages of _last_activated_skills and
_last_user_prompt in this module (functions that read/write them) and replace
reads/writes with the chosen thread-local/session-scoped accessor to ensure no
cross-agent contamination.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96f1bb48-b8d9-4530-89bb-a261bafd34a4

📥 Commits

Reviewing files that changed from the base of the PR and between eab8f68 and a27d5bd.

📒 Files selected for processing (2)

code_puppy/plugins/auto_skill_activator/register_callbacks.py
tests/plugins/test_auto_skill_activator.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/plugins/test_auto_skill_activator.py

… thread leakage - Change handled: False → handled: True so model_utils.py actually accepts the injected prompt (critical — feature was non-functional) - Replace module-level mutable state with ContextVar for thread-safe concurrent agent support - Replace ThreadPoolExecutor + Future.result(timeout) with asyncio.wait_for() to properly cancel async tasks on timeout, preventing thread leakage

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

code_puppy/plugins/auto_skill_activator/register_callbacks.py (1)
49-52: Consider using a factory for the ContextVar default to prevent accidental mutation.

The default=[] creates a single shared list object. While the current code correctly uses .set() to replace values rather than mutating, a future modification like _last_activated_skills.get().append(...) would unexpectedly mutate the shared default.
🛡️ Defensive alternative using factory pattern
-_last_activated_skills: ContextVar[List[str]] = ContextVar(
-    "_last_activated_skills", default=[]
-)
+_last_activated_skills: ContextVar[List[str]] = ContextVar("_last_activated_skills")
+
+def _get_activated_skills() -> List[str]:
+    try:
+        return _last_activated_skills.get()
+    except LookupError:
+        return []
Alternatively, document explicitly that the returned list must not be mutated.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_puppy/plugins/auto_skill_activator/register_callbacks.py` around lines
49 - 52, The ContextVar _last_activated_skills currently uses default=[] which
creates a shared mutable list; change its default to an immutable value (e.g.,
default=()) and update the type hint accordingly (e.g., ContextVar[Tuple[str,
...]] _last_activated_skills = ContextVar("_last_activated_skills", default=())
so callers cannot accidentally share/mutate the same list, and when a mutable
list is needed explicitly call _last_activated_skills.set(list(...)) or convert
the tuple to a list before modifying; leave _last_user_prompt unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code_puppy/plugins/auto_skill_activator/register_callbacks.py`:
- Around line 286-294: The LLM-returned skill name in the loop over scored can
differ in formatting from keys in skill_path_map causing silent misses; fix by
normalizing both sides and looking up using a normalized map: build a
normalized_skill_map (e.g., map normalized_key -> original_key/path) from
skill_path_map (normalize via lowercasing and a consistent replacement/strip
strategy), then in the loop normalize item["name"] and check against
normalized_skill_map instead of raw skill_path_map, append the canonical
name/path to matching and update the logger.info to include both the LLM name
and the matched canonical skill name for clarity (referencing variables scored,
AUTO_ACTIVATE_THRESHOLD, skill_path_map, matching, and logger.info).

---

Nitpick comments:
In `@code_puppy/plugins/auto_skill_activator/register_callbacks.py`:
- Around line 49-52: The ContextVar _last_activated_skills currently uses
default=[] which creates a shared mutable list; change its default to an
immutable value (e.g., default=()) and update the type hint accordingly (e.g.,
ContextVar[Tuple[str, ...]] _last_activated_skills =
ContextVar("_last_activated_skills", default=()) so callers cannot accidentally
share/mutate the same list, and when a mutable list is needed explicitly call
_last_activated_skills.set(list(...)) or convert the tuple to a list before
modifying; leave _last_user_prompt unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 791266a3-388f-43e1-854b-3745b99933a8

📥 Commits

Reviewing files that changed from the base of the PR and between a27d5bd and efb23c4.

📒 Files selected for processing (1)

code_puppy/plugins/auto_skill_activator/register_callbacks.py

The LLM may return reformatted skill names (e.g. 'GitHub PR Workflow' instead of 'github-pr-workflow'), causing silent lookup failures. Normalize both sides to lowercase before matching.

Prithvi1994 · 2026-04-14T03:34:58Z

@mpfaffenberger , Please review the changes when you get a chance. Thank you.

axacode · 2026-04-14T17:30:20Z

Hey @Prithvi1994 , i reviewed the code just now. Looks good from my end.

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

Comment thread tests/plugins/test_auto_skill_activator.py

Comment thread tests/plugins/test_auto_skill_activator.py

Comment thread tests/plugins/test_auto_skill_activator.py

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

Comment thread code_puppy/plugins/auto_skill_activator/register_callbacks.py

Comment thread code_puppy/plugins/auto_skill_activator/register_callbacks.py

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

Comment thread code_puppy/plugins/auto_skill_activator/register_callbacks.py

fix(skills): case-insensitive skill name matching for LLM responses

a85c5cd

The LLM may return reformatted skill names (e.g. 'GitHub PR Workflow' instead of 'github-pr-workflow'), causing silent lookup failures. Normalize both sides to lowercase before matching.

axacode approved these changes Apr 14, 2026

View reviewed changes

Conversation

Prithvi1994 commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Why This Approach

Files Changed

Test Coverage

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpfaffenberger commented Apr 13, 2026

Uh oh!

Prithvi1994 commented Apr 13, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Prithvi1994 commented Apr 14, 2026

Uh oh!

axacode commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Prithvi1994 commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading