Fix concurrency, SQL injection, and resource leaks#3
Conversation
- api/server.py: disable engine.io upgrades (fixes WebSocket 500s on Werkzeug threading mode); add coalescing dispatcher so a fast indexer cannot flood SocketIO clients or the browser. - api/state.py: make AppState.reload atomic — build new DB/vectors/ engines first, swap refs, close the old DB on a delayed timer so in-flight requests don't hit a closed connection. - storage/db.py: whitelist columns in update_node_fields to close a SQL-injection surface (column names cannot be parameterized). - storage/vectors.py: add an RLock around all mutation/query/save so the watcher and indexer cannot corrupt the ids<->matrix mapping. Also reset FAISS index + state consistently on load failure / dimension mismatch. - watcher/incremental.py: accept and respect the API's single-flight index_lock so the watcher and /api/index never open two writers on the same DB; re-queue on contention; trigger AppState.reload after each incremental flush. - engine/server_mode.py: track every spawned llama_cpp.server Popen and stop them via atexit so a crashed parent doesn't leave orphan model servers running. - storage/compsrc.py: replace bare except: pass with narrowly-typed exception handlers and logging.warning so I/O failures are visible. Amp-Thread-ID: https://ampcode.com/threads/T-019e8387-f758-775f-aaf2-2ed8d1209c8b Co-authored-by: Amp <amp@ampcode.com>
Reviewer's GuideAddresses production WebSocket failures, SQL-injection risk, vector-store race conditions, watcher/indexer contention, and orphaned model subprocesses by tightening concurrency controls, constraining SQL updates, coalescing SocketIO events, hardening vector-store load/save logic, coordinating the file watcher with API indexing, and tracking/cleaning up model server processes. Sequence diagram for RepoWatcher flush coordination and AppState reloadsequenceDiagram
participant RepoWatcher
participant IndexLock as index_lock
participant Indexer
participant GraphDB as indexer.db
participant VectorStore as indexer.vectors
participant AppState as state
RepoWatcher->>RepoWatcher: _flush()
RepoWatcher->>IndexLock: acquire(blocking=False)
alt lock_not_acquired
RepoWatcher->>RepoWatcher: _pending.update(changed)
RepoWatcher->>RepoWatcher: threading.Timer(..., _flush).start()
RepoWatcher-->>IndexLock: (no lock held)
else lock_acquired
RepoWatcher->>Indexer: __init__(cfg, bus, do_summarize)
loop each_deleted_path
Indexer->>GraphDB: delete_file(path)
GraphDB-->>Indexer: ids
Indexer->>VectorStore: remove(set(ids))
loop each_id
Indexer->>RepoWatcher: bus.emit("node_remove", id)
end
end
RepoWatcher->>Indexer: prune_deleted_files(cfg, db, vectors)
alt existing_changed_files
RepoWatcher->>Indexer: index(only_changed=existing)
end
RepoWatcher->>GraphDB: close()
alt on_reload_callback_present
RepoWatcher->>AppState: reload()
end
RepoWatcher->>IndexLock: release()
end
Sequence diagram for coalesced SocketIO event dispatchsequenceDiagram
participant Indexer
participant EventBus as state.bus
participant Forward as _forward
participant Dispatcher as _dispatcher
participant SocketIO as socketio
participant Client as BrowserClient
Indexer->>EventBus: emit(IndexEvent)
EventBus->>Forward: _forward(evt)
Forward->>Forward: evt.to_dict()
Forward->>Forward: pending_index/pending_ext.append(d)
Forward->>Dispatcher: flush_event.set()
loop dispatcher_loop
Dispatcher->>Dispatcher: flush_event.wait()
Dispatcher->>Dispatcher: flush_event.clear()
Dispatcher->>Dispatcher: time.sleep(_BATCH_INTERVAL)
Dispatcher->>Dispatcher: take up to _BATCH_MAX_EVENTS from pending_*
loop idx_batch
Dispatcher->>SocketIO: emit(index_event, d)
end
loop ext_batch
Dispatcher->>SocketIO: emit(ext_event, d)
end
alt more_pending
Dispatcher->>Dispatcher: flush_event.set()
end
end
SocketIO-->>Client: index_event/ext_event messages
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughThis PR hardens concurrency safety and enables safe hot-reloading across the system. AppState now atomically swaps components under a dedicated lock while preserving in-flight requests. The watch mode's incremental indexer coordinates with the foreground API via the shared lock. VectorStore, SocketIO dispatch, and server subprocess management all add thread-safety or resource cleanup guarantees. Storage errors are now logged for visibility, and database column updates are whitelisted to prevent injection. ChangesConcurrency, Hot-Reload, and Safety Hardening
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 1 security issue, and left some high level feedback:
Security issues:
- Detected subprocess function 'Popen' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
General comments:
- In
AppState.reload,_build()runs under_reload_lock, which could significantly block request threads if building the embedder/chat is slow; consider doing the potentially expensive construction outside the lock and only taking_reload_lockfor the final attribute swap and old-db capture. - In
RepoWatcher._flush, exceptions from theon_reloadcallback are silently swallowed; capturing and logging these (similar to the new logging incompsrc) would make it much easier to debug reload failures without affecting robustness.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `AppState.reload`, `_build()` runs under `_reload_lock`, which could significantly block request threads if building the embedder/chat is slow; consider doing the potentially expensive construction outside the lock and only taking `_reload_lock` for the final attribute swap and old-db capture.
- In `RepoWatcher._flush`, exceptions from the `on_reload` callback are silently swallowed; capturing and logging these (similar to the new logging in `compsrc`) would make it much easier to debug reload failures without affecting robustness.
## Individual Comments
### Comment 1
<location path="graphindex/graphindex/engine/server_mode.py" line_range="63" />
<code_context>
proc = subprocess.Popen(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'Popen' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 59d52da79f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| socketio = SocketIO(app, cors_allowed_origins="*", async_mode="threading", | ||
| allow_upgrades=False) |
There was a problem hiding this comment.
Force polling instead of only disabling upgrades
In the inspected WebUI socket paths, frontend/src/api/client.js still connects with transports: ['websocket', 'polling'], so the browser's first request is a direct /socket.io/?transport=websocket. allow_upgrades=False only removes websocket from the polling handshake's upgrade list; python-engineio still handles an initial transport == 'websocket' by entering the websocket request path. As a result, the Werkzeug WebSocket 500s this patch is trying to avoid can still occur for the bundled frontend; force polling with the Engine.IO/Socket.IO transports=['polling'] option on the server or update the client transport list.
Useful? React with 👍 / 👎.
Fixes based on Gemini Code Assist, CodeRabbit, Codex, and Sourcery reviews of PR Undertaker-afk#3 in Undertaker-afk/Incodedex. Critical: - storage/db.py: update_node_fields now commits the transaction (was leaving SQLite with an open implicit transaction, blocking other writers and never persisting the change). - watcher/incremental.py: when an incremental flush contains only file deletions, call vectors.save() so the removals persist across restarts (previously only Indexer.index() saved vectors). High: - storage/vectors.py: validate dimension BEFORE mutating state in add() and search(); a wrong-dim input now raises ValueError without corrupting ids / _id_to_idx / _matrix / FAISS index. - api/server.py: the SocketIO dispatcher thread now catches per-emit and loop-level exceptions so a transient socketio.emit failure can no longer kill the thread and silently stop all event forwarding. Medium: - api/state.py: read-side components are now bundled in a frozen _ReadState dataclass and swapped via a single atomic self._state pointer. Build runs OUTSIDE _reload_lock (no blocking of request threads on slow model loads). The delayed close-old-db Timer is daemon=True so it cannot block interpreter shutdown. __getattr__ proxies db/vectors/embedder/etc. through the snapshot for backwards compatibility; build_extended captures a single snapshot so it can no longer mix old + new components. - api/server.py + frontend/src/api/client.js: belt-and-suspenders on the websocket fix. Server now also rejects connect attempts that arrive directly on the websocket transport (allow_upgrades=False only blocks the polling -> ws upgrade). Frontend connects with transports: ['polling'], upgrade: false so it never tries WS at all. - engine/server_mode.py: refactored _stop_all and stop_server through a shared _terminate_proc helper that always wait()s after kill() and only removes a Popen handle from _PROCS once it has been reaped, so zombies cannot leak and concurrent cleanup paths cannot double-stop a child mid-shutdown. Added a security-audit comment on the Popen call (shell=False, fixed argv). Low: - watcher/incremental.py: on_reload exceptions are now logged (via logging + bus.emit('log', ...)) instead of being silently swallowed. Fixed RepoWatcher.__init__ type annotations: replaced the stringy 'callable | None' with Optional[Callable[[], None]] and a Lock-like annotation for index_lock. Amp-Thread-ID: https://ampcode.com/threads/T-019e8387-f758-775f-aaf2-2ed8d1209c8b Co-authored-by: Amp <amp@ampcode.com>
|
Ohhh nice i will have a look |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
graphindex/graphindex/engine/server_mode.py (1)
52-64:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep tracked handles until process exit is confirmed.
_stop_all()andstop_server()remove entries from_PROCSeven when_terminate_proc()may fail/time out. That can orphan a live child and prevents later cleanup retries.🔧 Proposed fix
def _stop_all() -> None: @@ for p in procs: _terminate_proc(p) with _PROCS_LOCK: - try: - _PROCS.remove(p) - except ValueError: - pass + if p in _PROCS and p.poll() is not None: + _PROCS.remove(p) @@ def stop_server(proc: subprocess.Popen) -> None: @@ _terminate_proc(proc) with _PROCS_LOCK: - try: - _PROCS.remove(proc) - except ValueError: - pass + if proc in _PROCS and proc.poll() is not None: + _PROCS.remove(proc)Also applies to: 90-101
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@graphindex/graphindex/engine/server_mode.py` around lines 52 - 64, _stop_all() and stop_server() currently remove entries from _PROCS immediately after calling _terminate_proc(), which can orphan children if termination fails or times out; change the logic so entries in _PROCS are only removed once the child is confirmed reaped (e.g., after wait()/poll() shows exited or after _terminate_proc() returns success), by holding the handle in the local procs list and deferring removal under _PROCS_LOCK until the process exit is verified inside or after _terminate_proc(), and ensure any failed/timeout paths leave the handle in _PROCS for later retries; update both _stop_all and stop_server to use _PROCS_LOCK consistently and only remove from _PROCS when exit is confirmed.
🧹 Nitpick comments (2)
graphindex/graphindex/api/server.py (1)
38-45: 💤 Low valueConsider logging the exception at debug level.
The silent
except Exception: passmakes it hard to diagnose if the package resource lookup fails unexpectedly (e.g., broken install, permissions, malformed package). A debug-level log would help troubleshooting without adding noise in normal operation.🔧 Proposed fix
try: import importlib.resources as ilr ref = ilr.files("graphindex").joinpath("frontend_dist", "index.html") if ref.is_file(): # `index.html` is at the root of the bundle; return the parent dir. return Path(str(ref.parent)) - except Exception: - pass + except Exception as exc: + log.debug("importlib.resources lookup failed, using source-tree fallback: %s", exc)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@graphindex/graphindex/api/server.py` around lines 38 - 45, The silent except in the importlib.resources block swallows errors; change it to log the exception at debug level instead of passing. Add or reuse a module logger (e.g., logger = logging.getLogger(__name__)) and in the except block call logger.debug("failed to locate packaged frontend index.html", exc_info=True) (or logger.exception at debug level if configured) so failures from importlib.resources/ilr.files/ref.is_file() are recorded without changing control flow; keep returning Path(ref.parent) when found.graphindex/graphindex/engine/server_mode.py (1)
34-35: ⚡ Quick winAvoid silent broad catches in subprocess shutdown paths.
except Exception: passin_terminate_proc()hides teardown failures that are critical for diagnosing leaked/stuck servers. Catch expected subprocess errors explicitly and log unexpected failures.🔍 Proposed refactor
+import logging @@ _PROCS: list[subprocess.Popen] = [] _PROCS_LOCK = threading.Lock() +_LOG = logging.getLogger(__name__) @@ def _terminate_proc(proc: subprocess.Popen) -> None: @@ try: proc.terminate() - except Exception: - pass + except OSError: + _LOG.warning("terminate() failed for pid=%s", getattr(proc, "pid", None), exc_info=True) @@ try: proc.wait(timeout=5) return - except Exception: - pass + except subprocess.TimeoutExpired: + pass + except OSError: + _LOG.warning("wait() after terminate failed for pid=%s", getattr(proc, "pid", None), exc_info=True) @@ try: proc.kill() - except Exception: - pass + except OSError: + _LOG.warning("kill() failed for pid=%s", getattr(proc, "pid", None), exc_info=True) @@ try: proc.wait(timeout=5) - except Exception: - pass + except (subprocess.TimeoutExpired, OSError): + _LOG.warning("final wait() failed for pid=%s", getattr(proc, "pid", None), exc_info=True)Also applies to: 39-40, 44-45, 48-49
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@graphindex/graphindex/engine/server_mode.py` around lines 34 - 35, The shutdown helper _terminate_proc() currently swallows all exceptions with "except Exception: pass", which hides teardown failures; update the try/except blocks in _terminate_proc() (and the other similar shutdown catches) to catch specific expected errors (e.g., subprocess.SubprocessError, OSError) and log failures instead of ignoring them—use the module logger or processLogger to call logger.error or logger.exception with the exception details; for truly unexpected errors log them with logger.exception and optionally re-raise or ensure best-effort cleanup (proc.kill()/proc.terminate() and proc.wait with timeout) in a finally block so leaked/stuck processes are diagnosable and cleaned up.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@graphindex/graphindex/api/state.py`:
- Around line 125-129: The current logic uses a fixed 2s delayed close
(threading.Timer + _safe_close(old_state.db)) which can close the DB while
in-flight requests still use it; instead implement a usage-aware lifecycle: add
a reference count or an "in_use" tracking API on the state/db (e.g.,
acquire()/release() or increment/decrement a counter on old_state/db) and defer
calling _safe_close(old_state.db) until the usage count reaches zero (or wait on
an Event that release() sets), removing the fixed Timer; update callers to call
acquire() when they begin using state.db and release() when done so reload can
safely close the old_state only when no users remain.
- Around line 88-94: The ensure_chat method currently writes directly to proxied
read-side fields (self.chat, self.ask_engine) which shadows __getattr__-proxied
snapshot fields and can mix generations; fix by avoiding instance attribute
assignments: read dependent fields once into locals (e.g., db = self.db; vectors
= self.vectors; embedder = self.embedder), create chat = get_chat(self.cfg) and
ask_engine = AskEngine(self.cfg, db, vectors, embedder) locally, then publish
them back into the read-side snapshot using the component's atomic
update/publish mechanism (instead of setting self.chat/self.ask_engine directly)
so the update is atomic and does not break proxy semantics.
---
Duplicate comments:
In `@graphindex/graphindex/engine/server_mode.py`:
- Around line 52-64: _stop_all() and stop_server() currently remove entries from
_PROCS immediately after calling _terminate_proc(), which can orphan children if
termination fails or times out; change the logic so entries in _PROCS are only
removed once the child is confirmed reaped (e.g., after wait()/poll() shows
exited or after _terminate_proc() returns success), by holding the handle in the
local procs list and deferring removal under _PROCS_LOCK until the process exit
is verified inside or after _terminate_proc(), and ensure any failed/timeout
paths leave the handle in _PROCS for later retries; update both _stop_all and
stop_server to use _PROCS_LOCK consistently and only remove from _PROCS when
exit is confirmed.
---
Nitpick comments:
In `@graphindex/graphindex/api/server.py`:
- Around line 38-45: The silent except in the importlib.resources block swallows
errors; change it to log the exception at debug level instead of passing. Add or
reuse a module logger (e.g., logger = logging.getLogger(__name__)) and in the
except block call logger.debug("failed to locate packaged frontend index.html",
exc_info=True) (or logger.exception at debug level if configured) so failures
from importlib.resources/ilr.files/ref.is_file() are recorded without changing
control flow; keep returning Path(ref.parent) when found.
In `@graphindex/graphindex/engine/server_mode.py`:
- Around line 34-35: The shutdown helper _terminate_proc() currently swallows
all exceptions with "except Exception: pass", which hides teardown failures;
update the try/except blocks in _terminate_proc() (and the other similar
shutdown catches) to catch specific expected errors (e.g.,
subprocess.SubprocessError, OSError) and log failures instead of ignoring
them—use the module logger or processLogger to call logger.error or
logger.exception with the exception details; for truly unexpected errors log
them with logger.exception and optionally re-raise or ensure best-effort cleanup
(proc.kill()/proc.terminate() and proc.wait with timeout) in a finally block so
leaked/stuck processes are diagnosable and cleaned up.
🪄 Autofix (Beta)
❌ Autofix failed (check again to retry)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8223fc3d-be3b-46d1-9337-f83313e64454
📒 Files selected for processing (8)
graphindex/frontend/src/api/client.jsgraphindex/graphindex/api/server.pygraphindex/graphindex/api/state.pygraphindex/graphindex/cli.pygraphindex/graphindex/engine/server_mode.pygraphindex/graphindex/storage/db.pygraphindex/graphindex/storage/vectors.pygraphindex/graphindex/watcher/incremental.py
✅ Files skipped from review due to trivial changes (1)
- graphindex/frontend/src/api/client.js
🚧 Files skipped from review as they are similar to previous changes (4)
- graphindex/graphindex/storage/db.py
- graphindex/graphindex/watcher/incremental.py
- graphindex/graphindex/storage/vectors.py
- graphindex/graphindex/cli.py
| @@ -53,20 +94,43 @@ def ensure_chat(self): | |||
| return self.chat | |||
There was a problem hiding this comment.
Avoid direct writes to proxied read-side fields in ensure_chat.
Line 91 and Line 92 assign self.chat / self.ask_engine on the instance, which shadows __getattr__-proxied snapshot fields and breaks atomic snapshot semantics after the first call. This path can also mix generations by reading self.db, self.vectors, and self.embedder via separate lookups during a concurrent reload.
🔧 Proposed fix
-from dataclasses import dataclass
+from dataclasses import dataclass, replace
@@
def ensure_chat(self):
- """Retry chat-model discovery and keep AskEngine wired to it."""
- if self.chat is None:
- self.chat = get_chat(self.cfg)
- self.ask_engine = AskEngine(self.cfg, self.db, self.vectors,
- self.embedder, chat=self.chat)
- return self.chat
+ """Retry chat-model discovery and atomically publish updated snapshot."""
+ snap = self._state
+ if snap.chat is not None:
+ return snap.chat
+ chat = get_chat(self.cfg)
+ with self._reload_lock:
+ cur = self._state
+ if cur.chat is None:
+ self._state = replace(
+ cur,
+ chat=chat,
+ ask_engine=AskEngine(
+ self.cfg, cur.db, cur.vectors, cur.embedder, chat=chat
+ ),
+ )
+ return self._state.chat🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@graphindex/graphindex/api/state.py` around lines 88 - 94, The ensure_chat
method currently writes directly to proxied read-side fields (self.chat,
self.ask_engine) which shadows __getattr__-proxied snapshot fields and can mix
generations; fix by avoiding instance attribute assignments: read dependent
fields once into locals (e.g., db = self.db; vectors = self.vectors; embedder =
self.embedder), create chat = get_chat(self.cfg) and ask_engine =
AskEngine(self.cfg, db, vectors, embedder) locally, then publish them back into
the read-side snapshot using the component's atomic update/publish mechanism
(instead of setting self.chat/self.ask_engine directly) so the update is atomic
and does not break proxy semantics.
| timer = threading.Timer(2.0, lambda: _safe_close(old_state.db)) | ||
| timer.daemon = True # don't block interpreter shutdown | ||
| timer.start() | ||
| except Exception: | ||
| pass | ||
| self._open() | ||
| _safe_close(old_state.db) |
There was a problem hiding this comment.
Replace fixed-delay DB close with usage-aware lifecycle.
Line 125 uses a hardcoded 2-second delayed close for old_state.db. Long-running in-flight work can exceed that window and then fail on a closed DB handle. This is a real race under concurrent reload + request execution.
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 128-128: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@graphindex/graphindex/api/state.py` around lines 125 - 129, The current logic
uses a fixed 2s delayed close (threading.Timer + _safe_close(old_state.db))
which can close the DB while in-flight requests still use it; instead implement
a usage-aware lifecycle: add a reference count or an "in_use" tracking API on
the state/db (e.g., acquire()/release() or increment/decrement a counter on
old_state/db) and defer calling _safe_close(old_state.db) until the usage count
reaches zero (or wait on an Event that release() sets), removing the fixed
Timer; update callers to call acquire() when they begin using state.db and
release() when done so reload can safely close the old_state only when no users
remain.
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. An unexpected error occurred while generating fixes: Not Found - https://docs.github.com/rest/git/refs#get-a-reference |
Fixes a set of real issues found in graphindex/ — including a live WebSocket failure visible in
server.log, a SQL-injection surface, race conditions in the vector store / watcher, and orphan model subprocesses.Fixes
api/server.pyAssertionError: write() before start_response) on every/socket.io/?transport=websocketrequest — visible in productionserver.log. Werkzeug threading mode does not handle WS upgrades reliably.allow_upgrades=Falseso engine.io stays on long-polling, which works correctly under threading + Werkzeug.api/server.pynode_add/node_updatewas forwarded to SocketIO immediately, flooding the WS channel and the browser during full re-indexes.storage/db.pyupdate_node_fieldsinterpolated column names with f-strings — values were parameterized, but unvalidated keys execute as SQL.ValueError. Verified blocked end-to-end.api/state.pyAppState.reloadclosed and reopened the DB while in-flight requests still held the old connection.storage/vectors.pyVectorStore.add/remove/search/savemutatedids,_id_to_idx,_matrixwith no synchronization — the watcher and main indexer could race and corrupt the id↔matrix mapping.RLockaround all mutation/query/save paths. Existing behavior preserved;tests/test_review_fixes.pystill passes.storage/vectors.py_loadon dimension mismatch returned early, potentially leaving FAISS index and Python state inconsistent._reset_state()resets ids/matrix/FAISS index together; called on both mismatch and load errors.watcher/incremental.py+cli.pyIndexerin_flushwithout coordinating with the API's single-flightrun_index→ two writers on the same SQLite + FAISS state.index_lock; re-queues on contention; triggersAppState.reloadafter each flush. CLIserve --watchwires the API'sstate.index_lockandstate.reloadin.engine/server_mode.pystart_serverreturned aPopenthat callers dropped → orphanllama_cpp.serverprocesses survived crashes/restarts, holding GPU/RAM and ports.Popenin a module-level list;atexithandler runsterminate → wait → kill. Added explicitstop_server().storage/compsrc.pyexcept Exception: passswallowed all I/O errors (permission denied, disk full, corrupt cache).OSError,zlib.error,JSONDecodeError,UnicodeDecodeError) and addedlogging.warning.Verification
python -m py_compileon every.pyundergraphindex/andtests/→ clean.VectorStoreportion oftests/test_review_fixes.py(add, update, dim-mismatch reopen, search) → all assertions pass.update_node_fieldsagainst a real SQLite DB → legit columns work, injection attempts ('state=? WHERE 1=1; --') and unknown columns blocked with a clearValueError.pytest tests/should be run in CI (project requires Python ≥ 3.10 due totree-sitter>=0.25; my local environment only had 3.9).Stats
8 files changed, +287 / -81.
Summary by Sourcery
Improve robustness and safety of indexing, storage, and serving by tightening concurrency, lifecycle, and error handling across the API, watcher, vector store, DB, and model server.
Bug Fixes:
Enhancements:
Summary by CodeRabbit
New Features
Bug Fixes
Improvements