From 62bf41102b3876e1eaa6c74d751f79de63e1ec38 Mon Sep 17 00:00:00 2001 From: Justyna Wojtczak Date: Mon, 4 May 2026 15:52:36 +0200 Subject: [PATCH] docs: scrub real project names + sync MCP tool list MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two cleanup chores rolled into one branch: 1. `docs/mcp.md` — the tool table only listed five tools and the lifecycle note still said "five schemas". The codebase has shipped `armillary_steal` and `armillary_revive` since, so the public MCP reference was lying about its surface. Add the two missing rows and fix the count. 2. Real project names had crept into shipped docstrings/comments, violating the project's private-data rule (no real project names in public content): - `mcp_tools.py` `armillary_context` examples used a real project name and a real dormant one. Replace with the generic placeholders the rule explicitly recommends (`my-saas-app`, `old-prototype`). - `code_index.py` FTS-tokeniser docstring quoted a real project name three times to illustrate prefix matching. Substitute a synthetic identifier (`user_session`) that shows the same behaviour without leaking a private name. - `revive_enhanced.py` module docstring listed three real project names as evidence for single-token query strategy. Drop the names; the empirical claim survives without them. - `CHANGELOG.md` quoted two of the same names in the v0.1 write-up. Same fix. Side benefit of #2: removing real names from indexed Python files also removes a class of false-positive matches in `armillary_revive` STEAL_HITS — when a user revives a project whose name token appears in armillary's own docstrings, the code index used to surface armillary's source as a "cross-repo match", which was noise from a private-data leak rather than useful cross-repo signal. --- CHANGELOG.md | 7 +++---- docs/mcp.md | 4 +++- src/armillary/code_index.py | 8 ++++---- src/armillary/mcp_tools.py | 4 ++-- src/armillary/revive_enhanced.py | 3 +-- 5 files changed, 13 insertions(+), 13 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 551c225..105842d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -25,10 +25,9 @@ skipped while we are still in 0.x. delivers cross-repo matches. An earlier draft also folded in the last commit subject, but a 6-token AND-FTS query is precision-extreme and produced zero matches across a real, sizeable cache. - Single-token ranking lets underscore/dash names tokenise naturally - (`pdf_to_quiz`, `reddit_promo_planner`) and BM25 picks the best - matches. Sanity check on three real projects returned 1+ relevant - cross-repo block each. + Single-token ranking lets underscore- and dash-naming tokenise + naturally and BM25 picks the best matches. Sanity check on three + real projects returned 1+ relevant cross-repo block each. - Scope decision: only `STEAL_HITS` is in v0.1. Other candidate fields (project status, last-touched timestamp, journal entries) were deferred — status carries a real risk of eroding trust in the whole diff --git a/docs/mcp.md b/docs/mcp.md index cfaf6a6..75f5e1b 100644 --- a/docs/mcp.md +++ b/docs/mcp.md @@ -80,6 +80,8 @@ treating the return as literal text. | `armillary_search` | `(query: str, max_results: int = 20) → str` | <50 ms per repo hit | | `armillary_projects` | `(status_filter: str \| None) → str` | <20 ms | | `armillary_pulse` | `() → str` | <30 ms | +| `armillary_steal` | `(query: str, limit: int = 5, language: str \| None) → str` | <100 ms | +| `armillary_revive` | `(project_path: str) → str` | <500 ms (revive subprocess + steal) | Schemas are introspected from the Python function signatures; the agent receives them in the `tools/list` response during the MCP handshake. @@ -90,7 +92,7 @@ receives them in the `tools/list` response during the MCP handshake. on stdin. FastMCP replies with the server's name, version, and capabilities, then emits `initialized`. 2. **Tool discovery.** The agent immediately calls `tools/list`. FastMCP - returns the five schemas so the agent knows what it can invoke. Any + returns the seven schemas so the agent knows what it can invoke. Any system-level prompt instructions declared on the `FastMCP()` constructor ride along here — armillary's say *"ALWAYS call `armillary_next` at the very start of every conversation,"* which is diff --git a/src/armillary/code_index.py b/src/armillary/code_index.py index 231a0e1..0eb18b3 100644 --- a/src/armillary/code_index.py +++ b/src/armillary/code_index.py @@ -294,13 +294,13 @@ def _sanitize_fts_query(query: str) -> str: Each whitespace-delimited token becomes a prefix-matched quoted phrase (``"token" *``). Prefix matching widens the recall: a - query like ``linked_flow`` now matches ``linked_flow_policy``, - ``linked_flow_service``, etc., which FTS5's simple tokenizer + query like ``user_session`` now matches ``user_session_policy``, + ``user_session_service``, etc., which FTS5's simple tokenizer would otherwise treat as distinct tokens. CamelCase vs snake_case is a separate concern — FTS5's simple - tokenizer does not split ``LinkedFlow`` into ``linked`` + ``flow``, - so users searching for ``linked_flow`` will not hit ``LinkedFlow``. + tokenizer does not split ``UserSession`` into ``user`` + ``session``, + so users searching for ``user_session`` will not hit ``UserSession``. Callers who care should submit both variants or use the dedicated matcher (future work). diff --git a/src/armillary/mcp_tools.py b/src/armillary/mcp_tools.py index 3257c7c..d9350dd 100644 --- a/src/armillary/mcp_tools.py +++ b/src/armillary/mcp_tools.py @@ -278,8 +278,8 @@ def armillary_context(project_name: str) -> str: or "what's the state of X". NOT auto-triggered on directory change. Examples: - - armillary_context("pdf_to_quiz") → branch, 1 dirty file, last 5 commits - - armillary_context("speak-faster") → dormant, last commit 3 months ago + - armillary_context("my-saas-app") → branch, 1 dirty file, last 5 commits + - armillary_context("old-prototype") → dormant, last commit 3 months ago """ from armillary.context_service import get_context diff --git a/src/armillary/revive_enhanced.py b/src/armillary/revive_enhanced.py index 2c26b28..1b219b7 100644 --- a/src/armillary/revive_enhanced.py +++ b/src/armillary/revive_enhanced.py @@ -2,8 +2,7 @@ Query strategy v0.1: the project name is the only signal we send to ``steal()``. Empirically this gives 5–8 cross-repo matches for typical -underscore / dash naming (``pdf_to_quiz``, ``reddit_promo_planner``, -``claude-code-project-boundary``) because FTS5 tokenises the separators +underscore / dash naming because FTS5's tokeniser splits separators into meaningful sub-tokens. An earlier draft also folded in the last commit subject, but a 6-token AND query returns zero hits in practice. Single-token ranking is the simplest thing that delivers real value.