Skip to content

Feature: Auto-reindex when source files change (3 strategies) #317

@camhoccode

Description

@camhoccode

Pain Point

When files are indexed via ctx_index(path), the index becomes stale silently if the file changes afterward. Users get outdated search results with no warning, leading to incorrect decisions based on old code/data.

Painful scenarios:

  • Long coding sessions where source files change frequently
  • Config files evolving during refactoring
  • git pull/rebase changing many files at once between sessions
  • Current 14-day pruning is too coarse — a file indexed 1 hour ago can be completely wrong after a rebase

Strategy 1: File Watcher (real-time, mid-session)

Opt-in fs.watch on indexed file paths.

  • On ctx_index(path, source) → register path in watch list
  • On file change (debounced 2-5s) → auto-reindex that source
  • Config: "autoReindex": true | false (default off)
  • Ignore patterns: node_modules, dist, .git
  • Clean up watchers on session end

Pro: Zero-effort freshness — search always current
Con: Memory overhead from watchers (cap max watch count)


Strategy 2: Hash-based Stale Detection (lightweight, at search time)

SHA-256 hash check when returning search results.

  • On ctx_index(path) → store file hash + path in sources table
  • On ctx_search() → recompute hash for file-based results
  • If hash differs → mark as ⚠️ stale or auto-refresh
ALTER TABLE sources ADD COLUMN content_hash TEXT;
ALTER TABLE sources ADD COLUMN file_path TEXT;

Example output:

⚠️ [stale] Title of chunk
   Indexed: 2h ago | File modified: 5m ago

Pro: Near-zero overhead when unchanged (~1-5ms hash per file)
Con: Reactive — discover staleness only at search time


Strategy 3: Git-aware Batch Reindex (between sessions)

git diff-driven reindex on session start.

  • Store last_reindex_commit SHA in content DB
  • On session start → git diff --name-only <last>..HEAD
  • Cross-reference with indexed source paths → batch reindex changed files only
  • Handle deletes (remove from index) and renames (git diff --name-status)
  • Non-git fallback: mtime comparison
ctx_reindex              # reindex git-changed files
ctx_reindex --all        # force reindex everything
ctx_reindex --dry-run    # preview what would reindex

Pro: Smart — only reindexes what changed
Con: Git-only without fallback, doesn't help mid-session


Synergy

Strategy When it helps Cost
File watcher Real-time, mid-session Medium (watchers)
Hash check On-demand at search time Low (hash compare)
Git reindex Between sessions, after pull/rebase Low (one-time batch)

All three together = complete freshness guarantee. Each can be implemented independently.

Suggested Implementation Order

  1. Hash check (smallest change, biggest trust improvement)
  2. Git reindex (covers between-session gap)
  3. File watcher (full real-time, opt-in)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions