Skip to content

Fix concurrency, SQL injection, and resource leaks#3

Open
PierrunoYT wants to merge 3 commits into
Undertaker-afk:mainfrom
PierrunoYT:fix/concurrency-and-sql-injection
Open

Fix concurrency, SQL injection, and resource leaks#3
PierrunoYT wants to merge 3 commits into
Undertaker-afk:mainfrom
PierrunoYT:fix/concurrency-and-sql-injection

Conversation

@PierrunoYT

@PierrunoYT PierrunoYT commented Jun 1, 2026

Copy link
Copy Markdown

Fixes a set of real issues found in graphindex/ — including a live WebSocket failure visible in server.log, a SQL-injection surface, race conditions in the vector store / watcher, and orphan model subprocesses.

Fixes

# File Issue Fix
1 api/server.py WebSocket transport returned HTTP 500 (AssertionError: write() before start_response) on every /socket.io/?transport=websocket request — visible in production server.log. Werkzeug threading mode does not handle WS upgrades reliably. Set allow_upgrades=False so engine.io stays on long-polling, which works correctly under threading + Werkzeug.
2 api/server.py Event firehose: every node_add/node_update was forwarded to SocketIO immediately, flooding the WS channel and the browser during full re-indexes. Added a single coalescing dispatcher thread with batch-size + interval caps and a hard upper bound on queued events.
3 storage/db.py update_node_fields interpolated column names with f-strings — values were parameterized, but unvalidated keys execute as SQL. Whitelist of allowed columns; unknown keys raise ValueError. Verified blocked end-to-end.
4 api/state.py AppState.reload closed and reopened the DB while in-flight requests still held the old connection. Build the new components first, atomically swap public refs, then close the old DB on a delayed timer so concurrent reads finish cleanly.
5 storage/vectors.py VectorStore.add / remove / search / save mutated ids, _id_to_idx, _matrix with no synchronization — the watcher and main indexer could race and corrupt the id↔matrix mapping. Added an RLock around all mutation/query/save paths. Existing behavior preserved; tests/test_review_fixes.py still passes.
6 storage/vectors.py _load on dimension mismatch returned early, potentially leaving FAISS index and Python state inconsistent. New _reset_state() resets ids/matrix/FAISS index together; called on both mismatch and load errors.
7 watcher/incremental.py + cli.py The watcher built its own Indexer in _flush without coordinating with the API's single-flight run_index → two writers on the same SQLite + FAISS state. Watcher accepts an index_lock; re-queues on contention; triggers AppState.reload after each flush. CLI serve --watch wires the API's state.index_lock and state.reload in.
8 engine/server_mode.py start_server returned a Popen that callers dropped → orphan llama_cpp.server processes survived crashes/restarts, holding GPU/RAM and ports. Track every spawned Popen in a module-level list; atexit handler runs terminate → wait → kill. Added explicit stop_server().
9 storage/compsrc.py Bare except Exception: pass swallowed all I/O errors (permission denied, disk full, corrupt cache). Narrowed to specific exceptions (OSError, zlib.error, JSONDecodeError, UnicodeDecodeError) and added logging.warning.

Verification

  • python -m py_compile on every .py under graphindex/ and tests/ → clean.
  • Manually exercised the VectorStore portion of tests/test_review_fixes.py (add, update, dim-mismatch reopen, search) → all assertions pass.
  • Manually exercised update_node_fields against a real SQLite DB → legit columns work, injection attempts ('state=? WHERE 1=1; --') and unknown columns blocked with a clear ValueError.
  • Full pytest tests/ should be run in CI (project requires Python ≥ 3.10 due to tree-sitter>=0.25; my local environment only had 3.9).

Stats

8 files changed, +287 / -81.

Summary by Sourcery

Improve robustness and safety of indexing, storage, and serving by tightening concurrency, lifecycle, and error handling across the API, watcher, vector store, DB, and model server.

Bug Fixes:

  • Prevent WebSocket upgrade failures under threaded Werkzeug by forcing Socket.IO to use long-polling only.
  • Eliminate potential SQL injection in node updates by whitelisting allowed columns in update_node_fields and rejecting unknown keys.
  • Avoid races between the watcher and main indexer on shared SQLite/FAISS state by coordinating through a shared index lock and reloading read-side state after incremental updates.
  • Fix inconsistent vector store state on load or dimension mismatch by resetting all in-memory and FAISS structures together.
  • Prevent data races and corruption in the vector store by serializing all reads, writes, and saves with a reentrant lock.
  • Ensure spawned llama_cpp.server subprocesses are cleaned up on exit or when explicitly stopped to avoid orphaned model servers.
  • Stop silently swallowing component source I/O and decode errors by narrowing caught exceptions and logging warnings instead.
  • Make AppState.reload safe for in-flight requests by rebuilding new components, swapping atomically, and closing old DB connections after a grace period.

Enhancements:

  • Batch and coalesce indexing and extended events before emitting over Socket.IO to avoid flooding clients during large re-indexes.
  • Expose an explicit stop_server helper for model server subprocesses.
  • Return defensive copies from vector lookups to prevent external mutation of internal state.

Summary by CodeRabbit

  • New Features

    • Event batching/coalescing for real-time updates
    • Initial hello payload on client connect and long-polling-only socket behavior
    • Watch mode synchronized with indexing to avoid conflicts
  • Bug Fixes

    • Thread-safe vector storage and safer state reload with atomic swaps
    • Input validation to prevent unsafe updates
    • Automatic cleanup of spawned background servers on exit
  • Improvements

    • More structured error logging for storage/decompression failures

- api/server.py: disable engine.io upgrades (fixes WebSocket 500s on
  Werkzeug threading mode); add coalescing dispatcher so a fast indexer
  cannot flood SocketIO clients or the browser.
- api/state.py: make AppState.reload atomic — build new DB/vectors/
  engines first, swap refs, close the old DB on a delayed timer so
  in-flight requests don't hit a closed connection.
- storage/db.py: whitelist columns in update_node_fields to close a
  SQL-injection surface (column names cannot be parameterized).
- storage/vectors.py: add an RLock around all mutation/query/save so
  the watcher and indexer cannot corrupt the ids<->matrix mapping.
  Also reset FAISS index + state consistently on load failure /
  dimension mismatch.
- watcher/incremental.py: accept and respect the API's single-flight
  index_lock so the watcher and /api/index never open two writers on
  the same DB; re-queue on contention; trigger AppState.reload after
  each incremental flush.
- engine/server_mode.py: track every spawned llama_cpp.server Popen
  and stop them via atexit so a crashed parent doesn't leave orphan
  model servers running.
- storage/compsrc.py: replace bare except: pass with narrowly-typed
  exception handlers and logging.warning so I/O failures are visible.

Amp-Thread-ID: https://ampcode.com/threads/T-019e8387-f758-775f-aaf2-2ed8d1209c8b
Co-authored-by: Amp <amp@ampcode.com>
@sourcery-ai

sourcery-ai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Reviewer's Guide

Addresses production WebSocket failures, SQL-injection risk, vector-store race conditions, watcher/indexer contention, and orphaned model subprocesses by tightening concurrency controls, constraining SQL updates, coalescing SocketIO events, hardening vector-store load/save logic, coordinating the file watcher with API indexing, and tracking/cleaning up model server processes.

Sequence diagram for RepoWatcher flush coordination and AppState reload

sequenceDiagram
    participant RepoWatcher
    participant IndexLock as index_lock
    participant Indexer
    participant GraphDB as indexer.db
    participant VectorStore as indexer.vectors
    participant AppState as state

    RepoWatcher->>RepoWatcher: _flush()
    RepoWatcher->>IndexLock: acquire(blocking=False)
    alt lock_not_acquired
        RepoWatcher->>RepoWatcher: _pending.update(changed)
        RepoWatcher->>RepoWatcher: threading.Timer(..., _flush).start()
        RepoWatcher-->>IndexLock: (no lock held)
    else lock_acquired
        RepoWatcher->>Indexer: __init__(cfg, bus, do_summarize)
        loop each_deleted_path
            Indexer->>GraphDB: delete_file(path)
            GraphDB-->>Indexer: ids
            Indexer->>VectorStore: remove(set(ids))
            loop each_id
                Indexer->>RepoWatcher: bus.emit("node_remove", id)
            end
        end
        RepoWatcher->>Indexer: prune_deleted_files(cfg, db, vectors)
        alt existing_changed_files
            RepoWatcher->>Indexer: index(only_changed=existing)
        end
        RepoWatcher->>GraphDB: close()
        alt on_reload_callback_present
            RepoWatcher->>AppState: reload()
        end
        RepoWatcher->>IndexLock: release()
    end
Loading

Sequence diagram for coalesced SocketIO event dispatch

sequenceDiagram
    participant Indexer
    participant EventBus as state.bus
    participant Forward as _forward
    participant Dispatcher as _dispatcher
    participant SocketIO as socketio
    participant Client as BrowserClient

    Indexer->>EventBus: emit(IndexEvent)
    EventBus->>Forward: _forward(evt)
    Forward->>Forward: evt.to_dict()
    Forward->>Forward: pending_index/pending_ext.append(d)
    Forward->>Dispatcher: flush_event.set()

    loop dispatcher_loop
        Dispatcher->>Dispatcher: flush_event.wait()
        Dispatcher->>Dispatcher: flush_event.clear()
        Dispatcher->>Dispatcher: time.sleep(_BATCH_INTERVAL)
        Dispatcher->>Dispatcher: take up to _BATCH_MAX_EVENTS from pending_*
        loop idx_batch
            Dispatcher->>SocketIO: emit(index_event, d)
        end
        loop ext_batch
            Dispatcher->>SocketIO: emit(ext_event, d)
        end
        alt more_pending
            Dispatcher->>Dispatcher: flush_event.set()
        end
    end

    SocketIO-->>Client: index_event/ext_event messages
Loading

File-Level Changes

Change Details Files
Add locking and consistent reset semantics to the vector store to prevent race conditions and state corruption between watcher and main indexer.
  • Introduce a re-entrant lock on the vector store and wrap add/remove/save/search/get_vector/len in critical sections.
  • Add a _reset_state helper and use it on dimension mismatch or load failures to keep ids, matrix, and FAISS index consistent.
  • Ensure FAISS index is reset and rebuilt appropriately and return copies from get_vector to prevent external mutation.
graphindex/graphindex/storage/vectors.py
Coordinate the incremental file watcher with the main indexer and trigger state reloads after flushes to avoid concurrent writers and stale readers.
  • Extend RepoWatcher to accept a shared index_lock and an on_reload callback and store them on the instance.
  • In _flush, attempt a non-blocking acquire of index_lock, re-queue work when contention is detected, and always release the lock in a finally block.
  • Wrap Indexer lifecycle in try/finally, preserve existing delete/index behavior, and call on_reload after a successful flush while swallowing reload errors.
graphindex/graphindex/watcher/incremental.py
Make AppState reload atomic and safe for in-flight requests by rebuilding components off to the side, swapping references under a lock, and closing old DB connections asynchronously.
  • Factor out a _build helper that constructs fresh DB, vector store, embedder, search engine, chat, and ask engine without mutating self.
  • Guard reload swaps with a dedicated _reload_lock and perform an atomic assignment of db, vectors, embedder, search_engine, chat, and ask_engine.
  • Delay closing the old DB via a timer and add a _safe_close helper so in-flight requests holding the old connection can finish gracefully.
graphindex/graphindex/api/state.py
Stabilize SocketIO under Werkzeug threading and prevent event floods by forcing long-polling and batching events before emitting to clients.
  • Configure SocketIO with async_mode='threading' and allow_upgrades=False to disable unreliable WebSocket transport upgrades.
  • Introduce batch size and interval limits along with shared queues for index and extended events.
  • Add a background dispatcher thread that coalesces events, enforces caps, and drains queues by emitting index_event and ext_event in bounded batches.
graphindex/graphindex/api/server.py
Ensure llama_cpp server subprocesses are tracked and cleaned up on exit, and provide an explicit stop API.
  • Introduce a module-level list and lock to track all started subprocesses and register an atexit hook that terminates, waits, and kills remaining processes.
  • Update start_server to register new processes in the tracking list and return the handle.
  • Add stop_server to explicitly terminate and remove a tracked process, with fallbacks to kill on failure or timeout.
graphindex/graphindex/engine/server_mode.py
Replace broad exception swallowing in component-source persistence with targeted exception handling and logging.
  • Add a module-level logger and log warnings when store and flush_batch encounter OSError instead of silently ignoring failures.
  • Narrow retrieve error handling to specific I/O/decompression/decoding/JSON errors and log details before returning None.
  • Log prune_stale unlink failures rather than suppressing them completely.
graphindex/graphindex/storage/compsrc.py
Eliminate SQL injection surface in node field updates by enforcing a whitelist of allowed columns.
  • Define a frozen set of updatable node columns derived from the node schema.
  • Validate keys passed to update_node_fields against this whitelist and raise ValueError for disallowed columns before constructing the UPDATE clause.
  • Keep values parameterized as before while ensuring column names cannot be attacker-controlled.
graphindex/graphindex/storage/db.py
Wire the API’s index lock and reload callback into the CLI watcher so incremental updates and foreground indexing are serialized and readers see fresh data.
  • Update serve to create RepoWatcher with the shared state.bus, state.index_lock, and state.reload callback when watch mode is enabled.
  • Preserve existing watcher startup and state.watcher tracking behavior.
graphindex/graphindex/cli.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR hardens concurrency safety and enables safe hot-reloading across the system. AppState now atomically swaps components under a dedicated lock while preserving in-flight requests. The watch mode's incremental indexer coordinates with the foreground API via the shared lock. VectorStore, SocketIO dispatch, and server subprocess management all add thread-safety or resource cleanup guarantees. Storage errors are now logged for visibility, and database column updates are whitelisted to prevent injection.

Changes

Concurrency, Hot-Reload, and Safety Hardening

Layer / File(s) Summary
State hot-reload atomicity
graphindex/graphindex/api/state.py
AppState acquires _reload_lock, rebuilds fresh components via new _build() method, atomically swaps read-side singletons (db, vectors, embedder, search_engine, chat, ask_engine), and schedules async closure of the prior DB. In-flight requests continue using the old db until it is safely closed.
Watch mode coordination with index locking
graphindex/graphindex/watcher/incremental.py, graphindex/graphindex/cli.py
RepoWatcher now accepts shared index_lock and on_reload callback. The _flush() method attempts non-blocking lock acquisition; on contention, it re-queues and reschedules. On successful flush, it processes deletions first, prunes stale files, re-indexes changed files, invokes the reload callback (for cache invalidation), and releases the lock safely. The CLI wires the API server's index lock and reload callback to the watcher.
VectorStore thread-safe concurrent access
graphindex/graphindex/storage/vectors.py
VectorStore protects all mutations (add, remove, save) and reads (search, get_vector, len) with re-entrant locking. New _reset_state() helper centralizes clearing to an empty consistent state and resetting the FAISS index. _load() uses this helper on dimension mismatches or load failures. get_vector() returns a copy to prevent external mutation. Search queries execute under lock with an early fast-path for empty stores.
Server subprocess lifecycle management
graphindex/graphindex/engine/server_mode.py
Thread-safe global tracking of llama_cpp.server subprocesses via _PROCS list protected by lock. An atexit hook automatically terminates all tracked processes on interpreter exit. start_server() registers spawned processes in the tracked list; new stop_server() function explicitly terminates a process with graceful timeout and forceful kill fallback.
SocketIO event batching and coalescing
graphindex/graphindex/api/server.py, graphindex/frontend/src/api/client.js
Replaces immediate per-event SocketIO emission with buffered, coalesced batching. Index and "ext_*" events are queued separately under a lock. A background dispatcher thread flushes batches at intervals, respecting _BATCH_INTERVAL and _BATCH_MAX_EVENTS limits. Queue size is capped to prevent unbounded growth if dispatch lags. SocketIO transport upgrades are disabled and frontend client options are centralized.
Storage error visibility via structured logging
graphindex/graphindex/storage/compsrc.py
Replaces silent exception suppression with structured error logging. Module-level logger emits warnings for OSError in store, flush_batch, and prune_stale. The retrieve method additionally logs for zlib.error, json.JSONDecodeError, and UnicodeDecodeError before returning gracefully. Error control flow (continue/return None) remains unchanged.
Database column update whitelisting
graphindex/graphindex/storage/db.py
GraphDB.update_node_fields now validates all provided keyword arguments against a class-level whitelist (_UPDATABLE_NODE_COLS) of allowed node table columns. Unwhitelisted keys raise ValueError before the UPDATE statement is constructed, preventing column-name injection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 A rabbit's ode to safer threads and reloads so clean:
Lock the state and swap it safe, let old DBs rest in peace,
Watchers yield, vectors guard, and batches calm the noisy lease,
Subprocesses bowed and logged, each error given a name—
Threads now dance without a crash; the forest sleeps the same.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.96% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: fixing concurrency issues (event batching, locking, atomic reload), SQL injection vulnerability (column whitelist), and resource leaks (atexit handlers, delayed cleanup).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 security issue, and left some high level feedback:

Security issues:

  • Detected subprocess function 'Popen' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

  • In AppState.reload, _build() runs under _reload_lock, which could significantly block request threads if building the embedder/chat is slow; consider doing the potentially expensive construction outside the lock and only taking _reload_lock for the final attribute swap and old-db capture.
  • In RepoWatcher._flush, exceptions from the on_reload callback are silently swallowed; capturing and logging these (similar to the new logging in compsrc) would make it much easier to debug reload failures without affecting robustness.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `AppState.reload`, `_build()` runs under `_reload_lock`, which could significantly block request threads if building the embedder/chat is slow; consider doing the potentially expensive construction outside the lock and only taking `_reload_lock` for the final attribute swap and old-db capture.
- In `RepoWatcher._flush`, exceptions from the `on_reload` callback are silently swallowed; capturing and logging these (similar to the new logging in `compsrc`) would make it much easier to debug reload failures without affecting robustness.

## Individual Comments

### Comment 1
<location path="graphindex/graphindex/engine/server_mode.py" line_range="63" />
<code_context>
    proc = subprocess.Popen(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'Popen' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread graphindex/graphindex/engine/server_mode.py Outdated
gemini-code-assist[bot]

This comment was marked as resolved.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 59d52da79f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +42 to +43
socketio = SocketIO(app, cors_allowed_origins="*", async_mode="threading",
allow_upgrades=False)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Force polling instead of only disabling upgrades

In the inspected WebUI socket paths, frontend/src/api/client.js still connects with transports: ['websocket', 'polling'], so the browser's first request is a direct /socket.io/?transport=websocket. allow_upgrades=False only removes websocket from the polling handshake's upgrade list; python-engineio still handles an initial transport == 'websocket' by entering the websocket request path. As a result, the Werkzeug WebSocket 500s this patch is trying to avoid can still occur for the bundled frontend; force polling with the Engine.IO/Socket.IO transports=['polling'] option on the server or update the client transport list.

Useful? React with 👍 / 👎.

coderabbitai[bot]

This comment was marked as resolved.

Fixes based on Gemini Code Assist, CodeRabbit, Codex, and Sourcery reviews
of PR Undertaker-afk#3 in Undertaker-afk/Incodedex.

Critical:
- storage/db.py: update_node_fields now commits the transaction (was
  leaving SQLite with an open implicit transaction, blocking other
  writers and never persisting the change).
- watcher/incremental.py: when an incremental flush contains only file
  deletions, call vectors.save() so the removals persist across
  restarts (previously only Indexer.index() saved vectors).

High:
- storage/vectors.py: validate dimension BEFORE mutating state in
  add() and search(); a wrong-dim input now raises ValueError without
  corrupting ids / _id_to_idx / _matrix / FAISS index.
- api/server.py: the SocketIO dispatcher thread now catches per-emit
  and loop-level exceptions so a transient socketio.emit failure can
  no longer kill the thread and silently stop all event forwarding.

Medium:
- api/state.py: read-side components are now bundled in a frozen
  _ReadState dataclass and swapped via a single atomic self._state
  pointer. Build runs OUTSIDE _reload_lock (no blocking of request
  threads on slow model loads). The delayed close-old-db Timer is
  daemon=True so it cannot block interpreter shutdown. __getattr__
  proxies db/vectors/embedder/etc. through the snapshot for backwards
  compatibility; build_extended captures a single snapshot so it can
  no longer mix old + new components.
- api/server.py + frontend/src/api/client.js: belt-and-suspenders on
  the websocket fix. Server now also rejects connect attempts that
  arrive directly on the websocket transport (allow_upgrades=False
  only blocks the polling -> ws upgrade). Frontend connects with
  transports: ['polling'], upgrade: false so it never tries WS at
  all.
- engine/server_mode.py: refactored _stop_all and stop_server through
  a shared _terminate_proc helper that always wait()s after kill() and
  only removes a Popen handle from _PROCS once it has been reaped, so
  zombies cannot leak and concurrent cleanup paths cannot double-stop
  a child mid-shutdown. Added a security-audit comment on the Popen
  call (shell=False, fixed argv).

Low:
- watcher/incremental.py: on_reload exceptions are now logged (via
  logging + bus.emit('log', ...)) instead of being silently swallowed.
  Fixed RepoWatcher.__init__ type annotations: replaced the stringy
  'callable | None' with Optional[Callable[[], None]] and a Lock-like
  annotation for index_lock.

Amp-Thread-ID: https://ampcode.com/threads/T-019e8387-f758-775f-aaf2-2ed8d1209c8b
Co-authored-by: Amp <amp@ampcode.com>
@Undertaker-afk

Copy link
Copy Markdown
Owner

Ohhh nice i will have a look

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
graphindex/graphindex/engine/server_mode.py (1)

52-64: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep tracked handles until process exit is confirmed.

_stop_all() and stop_server() remove entries from _PROCS even when _terminate_proc() may fail/time out. That can orphan a live child and prevents later cleanup retries.

🔧 Proposed fix
 def _stop_all() -> None:
@@
     for p in procs:
         _terminate_proc(p)
         with _PROCS_LOCK:
-            try:
-                _PROCS.remove(p)
-            except ValueError:
-                pass
+            if p in _PROCS and p.poll() is not None:
+                _PROCS.remove(p)

@@
 def stop_server(proc: subprocess.Popen) -> None:
@@
     _terminate_proc(proc)
     with _PROCS_LOCK:
-        try:
-            _PROCS.remove(proc)
-        except ValueError:
-            pass
+        if proc in _PROCS and proc.poll() is not None:
+            _PROCS.remove(proc)

Also applies to: 90-101

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphindex/graphindex/engine/server_mode.py` around lines 52 - 64,
_stop_all() and stop_server() currently remove entries from _PROCS immediately
after calling _terminate_proc(), which can orphan children if termination fails
or times out; change the logic so entries in _PROCS are only removed once the
child is confirmed reaped (e.g., after wait()/poll() shows exited or after
_terminate_proc() returns success), by holding the handle in the local procs
list and deferring removal under _PROCS_LOCK until the process exit is verified
inside or after _terminate_proc(), and ensure any failed/timeout paths leave the
handle in _PROCS for later retries; update both _stop_all and stop_server to use
_PROCS_LOCK consistently and only remove from _PROCS when exit is confirmed.
🧹 Nitpick comments (2)
graphindex/graphindex/api/server.py (1)

38-45: 💤 Low value

Consider logging the exception at debug level.

The silent except Exception: pass makes it hard to diagnose if the package resource lookup fails unexpectedly (e.g., broken install, permissions, malformed package). A debug-level log would help troubleshooting without adding noise in normal operation.

🔧 Proposed fix
     try:
         import importlib.resources as ilr
         ref = ilr.files("graphindex").joinpath("frontend_dist", "index.html")
         if ref.is_file():
             # `index.html` is at the root of the bundle; return the parent dir.
             return Path(str(ref.parent))
-    except Exception:
-        pass
+    except Exception as exc:
+        log.debug("importlib.resources lookup failed, using source-tree fallback: %s", exc)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphindex/graphindex/api/server.py` around lines 38 - 45, The silent except
in the importlib.resources block swallows errors; change it to log the exception
at debug level instead of passing. Add or reuse a module logger (e.g., logger =
logging.getLogger(__name__)) and in the except block call logger.debug("failed
to locate packaged frontend index.html", exc_info=True) (or logger.exception at
debug level if configured) so failures from
importlib.resources/ilr.files/ref.is_file() are recorded without changing
control flow; keep returning Path(ref.parent) when found.
graphindex/graphindex/engine/server_mode.py (1)

34-35: ⚡ Quick win

Avoid silent broad catches in subprocess shutdown paths.

except Exception: pass in _terminate_proc() hides teardown failures that are critical for diagnosing leaked/stuck servers. Catch expected subprocess errors explicitly and log unexpected failures.

🔍 Proposed refactor
+import logging
@@
 _PROCS: list[subprocess.Popen] = []
 _PROCS_LOCK = threading.Lock()
+_LOG = logging.getLogger(__name__)
@@
 def _terminate_proc(proc: subprocess.Popen) -> None:
@@
     try:
         proc.terminate()
-    except Exception:
-        pass
+    except OSError:
+        _LOG.warning("terminate() failed for pid=%s", getattr(proc, "pid", None), exc_info=True)
@@
     try:
         proc.wait(timeout=5)
         return
-    except Exception:
-        pass
+    except subprocess.TimeoutExpired:
+        pass
+    except OSError:
+        _LOG.warning("wait() after terminate failed for pid=%s", getattr(proc, "pid", None), exc_info=True)
@@
     try:
         proc.kill()
-    except Exception:
-        pass
+    except OSError:
+        _LOG.warning("kill() failed for pid=%s", getattr(proc, "pid", None), exc_info=True)
@@
     try:
         proc.wait(timeout=5)
-    except Exception:
-        pass
+    except (subprocess.TimeoutExpired, OSError):
+        _LOG.warning("final wait() failed for pid=%s", getattr(proc, "pid", None), exc_info=True)

Also applies to: 39-40, 44-45, 48-49

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphindex/graphindex/engine/server_mode.py` around lines 34 - 35, The
shutdown helper _terminate_proc() currently swallows all exceptions with "except
Exception: pass", which hides teardown failures; update the try/except blocks in
_terminate_proc() (and the other similar shutdown catches) to catch specific
expected errors (e.g., subprocess.SubprocessError, OSError) and log failures
instead of ignoring them—use the module logger or processLogger to call
logger.error or logger.exception with the exception details; for truly
unexpected errors log them with logger.exception and optionally re-raise or
ensure best-effort cleanup (proc.kill()/proc.terminate() and proc.wait with
timeout) in a finally block so leaked/stuck processes are diagnosable and
cleaned up.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@graphindex/graphindex/api/state.py`:
- Around line 125-129: The current logic uses a fixed 2s delayed close
(threading.Timer + _safe_close(old_state.db)) which can close the DB while
in-flight requests still use it; instead implement a usage-aware lifecycle: add
a reference count or an "in_use" tracking API on the state/db (e.g.,
acquire()/release() or increment/decrement a counter on old_state/db) and defer
calling _safe_close(old_state.db) until the usage count reaches zero (or wait on
an Event that release() sets), removing the fixed Timer; update callers to call
acquire() when they begin using state.db and release() when done so reload can
safely close the old_state only when no users remain.
- Around line 88-94: The ensure_chat method currently writes directly to proxied
read-side fields (self.chat, self.ask_engine) which shadows __getattr__-proxied
snapshot fields and can mix generations; fix by avoiding instance attribute
assignments: read dependent fields once into locals (e.g., db = self.db; vectors
= self.vectors; embedder = self.embedder), create chat = get_chat(self.cfg) and
ask_engine = AskEngine(self.cfg, db, vectors, embedder) locally, then publish
them back into the read-side snapshot using the component's atomic
update/publish mechanism (instead of setting self.chat/self.ask_engine directly)
so the update is atomic and does not break proxy semantics.

---

Duplicate comments:
In `@graphindex/graphindex/engine/server_mode.py`:
- Around line 52-64: _stop_all() and stop_server() currently remove entries from
_PROCS immediately after calling _terminate_proc(), which can orphan children if
termination fails or times out; change the logic so entries in _PROCS are only
removed once the child is confirmed reaped (e.g., after wait()/poll() shows
exited or after _terminate_proc() returns success), by holding the handle in the
local procs list and deferring removal under _PROCS_LOCK until the process exit
is verified inside or after _terminate_proc(), and ensure any failed/timeout
paths leave the handle in _PROCS for later retries; update both _stop_all and
stop_server to use _PROCS_LOCK consistently and only remove from _PROCS when
exit is confirmed.

---

Nitpick comments:
In `@graphindex/graphindex/api/server.py`:
- Around line 38-45: The silent except in the importlib.resources block swallows
errors; change it to log the exception at debug level instead of passing. Add or
reuse a module logger (e.g., logger = logging.getLogger(__name__)) and in the
except block call logger.debug("failed to locate packaged frontend index.html",
exc_info=True) (or logger.exception at debug level if configured) so failures
from importlib.resources/ilr.files/ref.is_file() are recorded without changing
control flow; keep returning Path(ref.parent) when found.

In `@graphindex/graphindex/engine/server_mode.py`:
- Around line 34-35: The shutdown helper _terminate_proc() currently swallows
all exceptions with "except Exception: pass", which hides teardown failures;
update the try/except blocks in _terminate_proc() (and the other similar
shutdown catches) to catch specific expected errors (e.g.,
subprocess.SubprocessError, OSError) and log failures instead of ignoring
them—use the module logger or processLogger to call logger.error or
logger.exception with the exception details; for truly unexpected errors log
them with logger.exception and optionally re-raise or ensure best-effort cleanup
(proc.kill()/proc.terminate() and proc.wait with timeout) in a finally block so
leaked/stuck processes are diagnosable and cleaned up.
🪄 Autofix (Beta)

❌ Autofix failed (check again to retry)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8223fc3d-be3b-46d1-9337-f83313e64454

📥 Commits

Reviewing files that changed from the base of the PR and between 59d52da and 2eb4f69.

📒 Files selected for processing (8)
  • graphindex/frontend/src/api/client.js
  • graphindex/graphindex/api/server.py
  • graphindex/graphindex/api/state.py
  • graphindex/graphindex/cli.py
  • graphindex/graphindex/engine/server_mode.py
  • graphindex/graphindex/storage/db.py
  • graphindex/graphindex/storage/vectors.py
  • graphindex/graphindex/watcher/incremental.py
✅ Files skipped from review due to trivial changes (1)
  • graphindex/frontend/src/api/client.js
🚧 Files skipped from review as they are similar to previous changes (4)
  • graphindex/graphindex/storage/db.py
  • graphindex/graphindex/watcher/incremental.py
  • graphindex/graphindex/storage/vectors.py
  • graphindex/graphindex/cli.py

Comment on lines 88 to 94
@@ -53,20 +94,43 @@ def ensure_chat(self):
return self.chat

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid direct writes to proxied read-side fields in ensure_chat.

Line 91 and Line 92 assign self.chat / self.ask_engine on the instance, which shadows __getattr__-proxied snapshot fields and breaks atomic snapshot semantics after the first call. This path can also mix generations by reading self.db, self.vectors, and self.embedder via separate lookups during a concurrent reload.

🔧 Proposed fix
-from dataclasses import dataclass
+from dataclasses import dataclass, replace
@@
     def ensure_chat(self):
-        """Retry chat-model discovery and keep AskEngine wired to it."""
-        if self.chat is None:
-            self.chat = get_chat(self.cfg)
-            self.ask_engine = AskEngine(self.cfg, self.db, self.vectors,
-                                        self.embedder, chat=self.chat)
-        return self.chat
+        """Retry chat-model discovery and atomically publish updated snapshot."""
+        snap = self._state
+        if snap.chat is not None:
+            return snap.chat
+        chat = get_chat(self.cfg)
+        with self._reload_lock:
+            cur = self._state
+            if cur.chat is None:
+                self._state = replace(
+                    cur,
+                    chat=chat,
+                    ask_engine=AskEngine(
+                        self.cfg, cur.db, cur.vectors, cur.embedder, chat=chat
+                    ),
+                )
+            return self._state.chat
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphindex/graphindex/api/state.py` around lines 88 - 94, The ensure_chat
method currently writes directly to proxied read-side fields (self.chat,
self.ask_engine) which shadows __getattr__-proxied snapshot fields and can mix
generations; fix by avoiding instance attribute assignments: read dependent
fields once into locals (e.g., db = self.db; vectors = self.vectors; embedder =
self.embedder), create chat = get_chat(self.cfg) and ask_engine =
AskEngine(self.cfg, db, vectors, embedder) locally, then publish them back into
the read-side snapshot using the component's atomic update/publish mechanism
(instead of setting self.chat/self.ask_engine directly) so the update is atomic
and does not break proxy semantics.

Comment on lines +125 to +129
timer = threading.Timer(2.0, lambda: _safe_close(old_state.db))
timer.daemon = True # don't block interpreter shutdown
timer.start()
except Exception:
pass
self._open()
_safe_close(old_state.db)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Replace fixed-delay DB close with usage-aware lifecycle.

Line 125 uses a hardcoded 2-second delayed close for old_state.db. Long-running in-flight work can exceed that window and then fail on a closed DB handle. This is a real race under concurrent reload + request execution.

🧰 Tools
🪛 Ruff (0.15.15)

[warning] 128-128: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphindex/graphindex/api/state.py` around lines 125 - 129, The current logic
uses a fixed 2s delayed close (threading.Timer + _safe_close(old_state.db))
which can close the DB while in-flight requests still use it; instead implement
a usage-aware lifecycle: add a reference count or an "in_use" tracking API on
the state/db (e.g., acquire()/release() or increment/decrement a counter on
old_state/db) and defer calling _safe_close(old_state.db) until the usage count
reaches zero (or wait on an Event that release() sets), removing the fixed
Timer; update callers to call acquire() when they begin using state.db and
release() when done so reload can safely close the old_state only when no users
remain.

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

An unexpected error occurred while generating fixes: Not Found - https://docs.github.com/rest/git/refs#get-a-reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants