Skill audit: freshness, pagination, graph-coverage honesty + vec_cache#6
Merged
Conversation
The 1.2.2 version bump (e8714a2) left .claude-plugin/plugin.json at 1.2.1 and requirements.lock pinned to the v1.2.0 release tarball, breaking the version-consistency contract enforced by test_plugin_manifest.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ools explain() now passes root/config into the retrieval pipeline so the index freshness block (stale / files_changed_since_build) is real instead of a hardcoded "fresh" fallback — the skill freshness check silently never fired for "how does X work" questions. explain also blends vectors when embeddings are enabled, matching search --mode hybrid. Skill (all targets: claude/codex/opencode + plugin skills/ + bin/ wrappers): - add `doctor` to the cbx whitelist (the skill fallback already invokes it) - narrow allowed-tools python to `-m codebase_index` so the skill cannot run arbitrary Python - document the --mode vector path, the intent/mode/pagination response fields, and clarify graph --open is a human-facing HTML view (use impact/refs for agent-readable dependency answers) Regenerated the three installed skill copies via skill-update so they match the authored skill/ and wheel-bundled skill_template/ sources. Tests: regression test that explain reports staleness after an edit; update the packaging whitelist assertion for the new `doctor` entry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The retrieval pipeline and MCP server already supported result paging (offset -> pagination.next_offset), but the CLI `search` command never surfaced an --offset flag. Every invocation silently returned page one and the advertised `pagination.next_offset` was a dead end for the skill. - Add `--offset` to `search` (rejects negative values with exit 2). - Surface a "more available — --offset N" note in markdown output. - Update SKILL.md (authored + packaged template + the three installed copies) to document paging via --offset, replacing the stale "CLI search is single-page" guidance. - Regression tests at the CLI layer for paging and negative-offset rejection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Import/inheritance edges are only extracted for the hand-tuned (Tier-A)
languages. A symbol or file in a Tier-B language (generic tree-sitter walk,
e.g. Lua) yields symbols and best-effort call sites but no import/extends/
implements edges, so `refs`/`impact` can silently undercount — an empty
result read as "nothing references this" is a footgun for an agent.
- Add a `GraphCoverage` model (`partial`, `languages`, `reason`) and attach it
to RefsResponse / ImpactResponse. `for_paths` classifies a symbol/target's
defining language(s): Tier-B (tree-sitter routed, no LangSpec) -> partial.
- refs_lookup judges coverage by the symbol's definition language; impact_lookup
by the resolved target file('s) language.
- Markdown output prints a "Partial graph coverage" warning (including on the
empty-result path, where it matters most).
- Document the `coverage` field in SKILL.md (authored + packaged template + the
four installed/plugin copies); regenerate the refs/impact goldens (Tier-A
Python = partial:false).
- Regression test over a mixed Lua/Python repo asserts partial for the Tier-B
symbol/file and full coverage for the Tier-A one.
Also syncs the plugin skill copy with the prior --offset doc change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extends the Tier-B graph-coverage honesty from per-query (refs/impact `coverage`) to a repo-wide, upfront signal. - `stats`: each tree-sitter language now carries `graph: full|partial` (`full` = Tier-A spec with import/inheritance edges; `partial` = Tier-B, symbols only). Human output appends "· partial graph (Tier-B)". - `doctor`: new informational `graph_coverage` finding listing Tier-B languages present in the index, so refs/impact undercounting is visible during diagnostics rather than only when an answer comes back empty. - Add `languages.has_full_graph(lang)` as the single source of truth for the Tier-A/Tier-B distinction, shared by stats and doctor. - Document the field in SKILL.md (all copies); regenerate the stats golden. - Regression tests: Tier-B (Lua) index flags partial; Tier-A-only is full. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…changed chunks Chunk ids churn on every full rebuild (replace_chunks), so a chunk-id-keyed skip alone re-embeds the entire repo each time. The embedding pass now hashes each chunk's content (sha256) and consults a `vec_cache` table keyed by (model, content_sha): only text never embedded under the active model hits the (potentially slow or paid) backend; unchanged content reuses its cached vector. - New `vec_cache` table + repo helpers (cached_embeddings, store_cached_embeddings, upsert_chunk_vector_blob); orphan vectors pruned in a single batched executemany. - `_embed_chunks` reports cache misses (vectors actually computed) as its count. - Schema/pipeline docs updated for the cache table and reuse flow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The package is at 1.2.2 (pyproject/__init__), but the installed skill `.skill_version` stamps still read 1.2.1. Sync them so the auto-update check doesn't see phantom drift. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`_parse_one` runs in ProcessPoolExecutor workers and reads the module global `_PARSE_CONFIG` (typed `Optional[Config]`, set by the pool initializer `_pool_init`). Passing it straight to `_parse`, which expects `Config`, tripped mypy (`Config | None` vs `Config`). The global is always set before any worker parses, so assert that invariant — documents the contract and satisfies the type checker. Restores a clean `mypy src/codebase_index`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Acts on a skill/CLI audit of
codebase-index, plus folds in two in-progress workstreams (embedding cache + release-sync). Each change ships with tests; full suite passes at 80.9% coverage andmypy src/codebase_indexis clean.Audit fixes
fix(skill)(eccfcac):explainnow honors the index-freshness contract (passesroot/configinto the pipeline; blends vectors when embeddings are on). Addeddoctorto thecbxwhitelist. Narrowed the skillallowed-toolsfromBash(python *)toBash(python -m codebase_index *). Documentedintent/mode/pagination,--mode vector, andgraph --open(human-only).feat(search)(ace0f9b): expose--offsetso the CLI/skill can actually page — the pipeline & MCP already supported it, but the CLI never surfaced the flag (every call silently returned page one). Markdown notes when more results exist.feat(graph)(3d86326):refs/impactnow report acoverageblock. Import/inheritance edges are only extracted for Tier-A languages, so an empty/short result for a Tier-B language (e.g. Lua) is inconclusive, not authoritative —coverage.partialflags it so agents fall back to Grep.feat(diagnostics)(9262ced):statstags each languagegraph: full|partial;doctoradds agraph_coveragefinding. Same honesty signal, surfaced repo-wide and upfront.Folded-in workstreams
feat(embeddings)(19df9d4): content-addressedvec_cachekeyed by(model, content_sha)— chunk ids churn on every rebuild, so this stops re-embedding unchanged content (only cache misses hit the backend).chore(release)(e92afe4): stamp installed skill.skill_versioncopies to 1.2.2.fix(types)(e1c0850): assert the_PARSE_CONFIGworker-global invariant to restore a cleanmypy(preexisting latent error, unmasked by cache invalidation).Notes
SKILL.md) is kept byte-identical across the authored copy, the packaged template, the three installed copies, and the plugin copy (parity tests enforce this).refs/impact/statswere regenerated deliberately (newcoverage/graphfields; Tier-A = full/non-partial).ruff formatdrift in a few files (cli.py,pipeline.py,repo.py,doctor.py,markdown.py) was left untouched to keep the diff focused; new code in this PR is format-clean.Known failing test (preexisting, unrelated)
tests/test_bootstrap.py::test_cold_run_installs_from_requirements_lockfails on Windows due to a path-separator assertion (root/requirements.lockvsroot\). Present onmain; not touched by this PR.🤖 Generated with Claude Code